mirror of https://github.com/apache/lucene.git
SOLR-12770: make docs on shards param a little more clear, fix a couple typos
This commit is contained in:
parent
b4b9c39392
commit
e1959956c5
|
@ -22,46 +22,48 @@ The chosen replica acts as an aggregator: it creates internal requests to random
|
|||
|
||||
== Limiting Which Shards are Queried
|
||||
|
||||
While one of the advantages of using SolrCloud is the ability to query very large collections distributed among various shards, in some cases <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,you may know that you are only interested in results from a subset of your shards>>. You have the option of searching over all of your data or just parts of it.
|
||||
While one of the advantages of using SolrCloud is the ability to query very large collections distributed across various shards, in some cases you may have configured Solr so you know <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,you are only interested in results from a specific subset of shards>>. You have the option of searching over all of your data or just parts of it.
|
||||
|
||||
Querying all shards for a collection should look familiar; it's as though SolrCloud didn't even come into play:
|
||||
A query across all shards for a collection is simply a query that does not define a `shards` parameter:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
http://localhost:8983/solr/gettingstarted/select?q=*:*
|
||||
----
|
||||
|
||||
If, on the other hand, you wanted to search just one shard, you can specify that shard by its logical ID, as in:
|
||||
If you want to search just one shard, use the `shards` parameter to specify the shard by its logical ID, as in:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1
|
||||
----
|
||||
|
||||
If you want to search a group of shard Ids, you can specify them together:
|
||||
If you want to search a group of shards, you can specify each shard separated by a comma in one request:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,shard2
|
||||
----
|
||||
|
||||
In both of the above examples, the shard Id(s) will be used to pick a random replica of that shard.
|
||||
In both of the above examples, while only the specific shards are queried, any random replica of the shard will get the request.
|
||||
|
||||
Alternatively, you can specify the explicit replicas you wish to use in place of a shard Ids:
|
||||
Alternatively, you can specify a list of replicas you wish to use in place of a shard IDs by separating the replica IDs with commas:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted,localhost:8983/solr/gettingstarted
|
||||
----
|
||||
|
||||
Or you can specify a list of replicas to choose from for a single shard (for load balancing purposes) by using the pipe symbol (|):
|
||||
Or you can specify a list of replicas to choose from for a single shard (for load balancing purposes) by using the pipe symbol (|) between different replica IDs:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted|localhost:7500/solr/gettingstarted
|
||||
----
|
||||
|
||||
And of course, you can specify a list of shards (separated by commas) each defined by a list of replicas (seperated by pipes). In this example, 2 shards are queried, the first being a random replica from shard1, the second being a random replica from the explicit pipe delimited list:
|
||||
Finally, you can specify a list of shards (separated by commas) each defined by a list of replicas (seperated by pipes).
|
||||
|
||||
In the following example, 2 shards are queried, the first being a random replica from shard1, the second being a random replica from the explicit pipe delimited list:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
|
@ -70,9 +72,11 @@ http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,localhost:7
|
|||
|
||||
== Configuring the ShardHandlerFactory
|
||||
|
||||
You can directly configure aspects of the concurrency and thread-pooling used within distributed search in Solr. This allows for finer grained control and you can tune it to target your own specific requirements. The default configuration favors throughput over latency.
|
||||
For finer-grained control, you can directly configure and tune aspects of the concurrency and thread-pooling used within distributed search in Solr. The default configuration favors throughput over latency.
|
||||
|
||||
To configure the standard search handler, provide a configuration like this in `solrconfig.xml`:
|
||||
This is done by defining a `shardHandler` in the configuration for your search handler.
|
||||
|
||||
To add a `shardHandler` to the standard search handler, provide a configuration in `solrconfig.xml`, as in this example:
|
||||
|
||||
[source,xml]
|
||||
----
|
||||
|
@ -112,10 +116,16 @@ If specified, the thread pool will use a backing queue instead of a direct hando
|
|||
Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`.
|
||||
|
||||
`shardsWhitelist`::
|
||||
If specified, this lists limits what nodes can be requested in the `shards` request parameter. In cloud mode this whitelist is automatically configured to include all live nodes in the cluster. In standalone mode the whitelist defaults to empty (sharding not allowed). If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`. The value of this parameter is a comma separated list of the nodes that will be whitelisted, i.e.:
|
||||
If specified, this lists limits what nodes can be requested in the `shards` request parameter.
|
||||
+
|
||||
In SolrCloud mode this whitelist is automatically configured to include all live nodes in the cluster.
|
||||
+
|
||||
In standalone mode the whitelist defaults to empty (sharding not allowed).
|
||||
+
|
||||
If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`. The value of this parameter is a comma separated list of the nodes that will be whitelisted, i.e.:
|
||||
`10.0.0.1:8983/solr,10.0.0.1:8984/solr`.
|
||||
|
||||
NOTE: In cloud mode, if at least one node is included in the whitelist, then the live_nodes will no longer be used as source for the list. This means that, if you need to do a cross-cluster request using the `shards` parameter in cloud mode (in addition to regular within-cluster requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist.
|
||||
+
|
||||
NOTE: In SolrCloud mode, if at least one node is included in the whitelist, then the `live_nodes` will no longer be used as source for the list. This means that if you need to do a cross-cluster request using the `shards` parameter in SolrCloud mode (in addition to regular within-cluster requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist.
|
||||
|
||||
== Configuring statsCache (Distributed IDF)
|
||||
|
||||
|
|
|
@ -60,14 +60,15 @@ The following components support distributed search:
|
|||
* The *Debug* component, which helps with debugging.
|
||||
|
||||
=== Shards Whitelist
|
||||
What nodes are allowed in the `shards` parameter is configurable through the `shardsWhitelist` property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs explicit configuration for master/slave mode. Read more details in <<distributed-requests.adoc#configuring-the-shardhandlerfactory>>.
|
||||
|
||||
The nodes allowed in the `shards` parameter is configurable through the `shardsWhitelist` property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs explicit configuration for master/slave mode. Read more details in the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>>.
|
||||
|
||||
== Limitations to Distributed Search
|
||||
|
||||
Distributed searching in Solr has the following limitations:
|
||||
|
||||
* Each document indexed must have a unique key.
|
||||
* If Solr discovers duplicate document IDs, Solr selects the first document and discards subsequent ones.
|
||||
* If Solr discovers duplicate document IDs, Solr selects the first document and discards subsequent documents.
|
||||
* The index for distributed searching may become momentarily out of sync if a commit happens between the first and second phase of the distributed search. This might cause a situation where a document that once matched a query and was subsequently changed may no longer match the query but will still be retrieved. This situation is expected to be quite rare, however, and is only possible for a single query request.
|
||||
* The number of shards is limited by number of characters allowed for GET method's URI; most Web servers generally support at least 4000 characters, but many servers limit URI length to reduce their vulnerability to Denial of Service (DoS) attacks.
|
||||
* Shard information can be returned with each document in a distributed search by including `fl=id, [shard]` in the search request. This returns the shard URL.
|
||||
|
|
|
@ -292,8 +292,12 @@ The TermsComponent also supports distributed indexes. For the `/terms` request h
|
|||
|
||||
`shards`::
|
||||
Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <<distributed-search-with-index-sharding.adoc#distributed-search-with-index-sharding,Distributed Search with Index Sharding>>.
|
||||
+
|
||||
The `shards` parameter is subject to a host whitelist that has to be configured in the component's parameters using the configuration key `shardsWhitelist` and the list of hosts as values.
|
||||
+
|
||||
By default the whitelist will be populated with all live nodes when running in SolrCloud mode. If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`.
|
||||
+
|
||||
See the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>> for more information about how the whitelist works.
|
||||
|
||||
`shards.qt`::
|
||||
Specifies the request handler Solr uses for requests to shards.
|
||||
|
||||
Same as with regular distributed search, the `shards` parameter is subject to a host whitelist that has to be configured in the component init parameters using the configuration key `shardsWhitelist` and the list of hosts as values. In the same way as with distributed search, the whitelist will be populated to all live nodes by default when running in SolrCloud mode. If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`.
|
||||
|
|
Loading…
Reference in New Issue