mirror of https://github.com/apache/lucene.git
Ref Guide: standardize i.e., e.g., spellings; fix typos
This commit is contained in:
parent
043c5dff6f
commit
153d7bcfee
|
@ -141,7 +141,7 @@ Step 2: Get the `sha512` hash of the jar
|
|||
openssl dgst -sha512 runtimelibs.jar
|
||||
----
|
||||
|
||||
Step 3 : Start solr with runtime lib enabled
|
||||
Step 3 : Start Solr with runtime lib enabled
|
||||
|
||||
[source,bash]
|
||||
----
|
||||
|
|
|
@ -155,27 +155,32 @@ Unlike the CLUSTERPROP command on the <<cluster-node-management.adoc#clusterprop
|
|||
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd clusterprop -name urlScheme -val https
|
||||
----
|
||||
|
||||
=== Export data from a collection to a file
|
||||
=== Export Data from a Collection to a File
|
||||
|
||||
This command downloads documents from all shards in parallel and write the documents to a single file. The supported format are `jsonl` and `javabin`
|
||||
This command downloads documents from all shards in parallel and write the documents to a single file. The supported format are `jsonl` and `javabin`.
|
||||
|
||||
Arguments are:
|
||||
|
||||
`-url` :: (Requred parameter) Url of the collection
|
||||
`-out` :: (Optional) Name of the file to write to. default file name is `<collection-name>.json` . If the file name ends with `.json.gz` , the output is a zip file of json
|
||||
`-format` :: (Optional) Supported values are json/javabin
|
||||
`-limit` :: (Optional) No:of docs to export. By default the entire collection is exported
|
||||
`-fields` :: (Optional) Fields to be exported. By default, all fields are exported
|
||||
`-url`:: (Required) The URL of the collection.
|
||||
|
||||
`-out`:: (Optional) Name of the file to write to. default file name is `<collection-name>.json`. If the file name ends with `.json.gz` , the output is a zip file of JSON.
|
||||
|
||||
`-format`:: (Optional) Supported values are `json` or `javabin`.
|
||||
|
||||
`-limit`:: (Optional) No:of docs to export. By default the entire collection is exported.
|
||||
|
||||
`-fields`:: (Optional) Fields to be exported. By default, all fields are exported.
|
||||
|
||||
Example 1: Export all documents in a collection `gettingstarted` into a file called `gettingstarted.json`:
|
||||
|
||||
example 1: Export all documents in a collection `gettingstarted` into a file called `gettingstarted.json`
|
||||
[source,bash]
|
||||
----
|
||||
bin/solr export -url http://localhost:8983/solr/gettingstarted
|
||||
----
|
||||
|
||||
example 2: export 1M docs of collection `gettingstarted` into a file called `1MDocs.json.gz` as a zipped json file
|
||||
Example 2: export 1M docs of collection `gettingstarted` into a file called `1MDocs.json.gz` as a zipped JSON file:
|
||||
|
||||
[source,bash]
|
||||
----
|
||||
bin/solr export -url http://localhost:8983/solr/gettingstarted -out 1MDocs.json.gz
|
||||
----
|
||||
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
Configsets are a set of configuration files used in a Solr installation: `solrconfig.xml`, the schema, and then <<resource-loading.adoc#resource-loading,resources>> like language files, `synonyms.txt`, DIH-related configuration, and others that are referenced from the config or schema.
|
||||
Configsets are a set of configuration files used in a Solr installation: `solrconfig.xml`, the schema, and then <<resource-loading.adoc#resource-loading,resources>> like language files, `synonyms.txt`, DIH-related configuration, and others.
|
||||
|
||||
Such configuration, _configsets_, can be named and then referenced by collections or cores, possibly with the intent to share them to avoid duplication.
|
||||
|
||||
|
@ -26,7 +26,7 @@ Solr ships with two example configsets located in `server/solr/configsets`, whic
|
|||
|
||||
If you are using Solr in standalone mode, configsets are managed on the filesystem.
|
||||
|
||||
Each Solr core can have it's very own configSet located beneath it in a `<instance_dir>/conf/` dir.
|
||||
Each Solr core can have it's very own configset located beneath it in a `<instance_dir>/conf/` dir.
|
||||
Here, it is not named or shared and the word _configset_ isn't found.
|
||||
In Solr's early years, this was _the only way_ it was configured.
|
||||
|
||||
|
|
|
@ -47,7 +47,7 @@ A SolrCloud cluster consists of some "logical" concepts layered on top of some "
|
|||
** The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable.
|
||||
** The theoretical limit in the number concurrent search requests that can be processed under heavy load.
|
||||
|
||||
WARNING: Make sure the DNS resolution in your cluster is stable, ie.
|
||||
WARNING: Make sure the DNS resolution in your cluster is stable, i.e.,
|
||||
for each live host belonging to a Cluster the host name always corresponds to the
|
||||
same specific IP and physical node. For example, in clusters deployed on AWS this would
|
||||
require setting `preserve_hostname: true` in `/etc/cloud/cloud.cfg`. Changing DNS resolution
|
||||
|
|
|
@ -341,7 +341,7 @@ include::{example-source-dir}JsonRequestApiTest.java[tag=solrj-ipod-query-bool-f
|
|||
|
||||
== Additional Queries
|
||||
|
||||
Multiple additional queries might be specified under `queries` key with all syntax alternatives described above. Every entry might have multiple values in array. Notice that old-style referencing `"{!v=$query_name}"` picks only the first element in array ignoring everything beyond, e.g. if one changes the reference below from `"{!v=$electronic}"` to `"{!v=$manufacturers}"` it's equivalent to querying for `manu:apple`, ignoring the later query. These queries don't impact query result until explicit referencing.
|
||||
Multiple additional queries might be specified under `queries` key with all syntax alternatives described above. Every entry might have multiple values in array. Notice that old-style referencing `"{!v=$query_name}"` picks only the first element in array ignoring everything beyond, e.g., if one changes the reference below from `"{!v=$electronic}"` to `"{!v=$manufacturers}"` it's equivalent to querying for `manu:apple`, ignoring the later query. These queries don't impact query result until explicit referencing.
|
||||
|
||||
[source,bash]
|
||||
----
|
||||
|
|
|
@ -40,7 +40,7 @@ Certain plugins or add-ons to plugins require placement here.
|
|||
They will document themselves to say so.
|
||||
|
||||
Solr incorporates Jetty for providing HTTP server functionality.
|
||||
Jetty has some directories that contain `.jar` files for itself and its own plugins / modules or JVM level plugins (e.g. loggers).
|
||||
Jetty has some directories that contain `.jar` files for itself and its own plugins / modules or JVM level plugins (e.g., loggers).
|
||||
Solr plugins won't work in these locations.
|
||||
|
||||
== Lib Directives in SolrConfig
|
||||
|
|
|
@ -192,7 +192,7 @@ This syntax has been removed entirely and if sent to Solr it will now produce an
|
|||
The pattern language is very similar but not the same.
|
||||
Typically, simply update the pattern by changing an uppercase 'Z' to lowercase 'z' and that's it.
|
||||
+
|
||||
For the current recommended set of patterns in schemaless mode, see the section <<schemaless-mode.adoc#schemaless-mode,Schemaless Mode>>, or simply examine the `_default` configSet (found in `server/solr/configsets`).
|
||||
For the current recommended set of patterns in schemaless mode, see the section <<schemaless-mode.adoc#schemaless-mode,Schemaless Mode>>, or simply examine the `_default` configset (found in `server/solr/configsets`).
|
||||
+
|
||||
Also note that the default set of date patterns (formats) have expanded from previous releases to subsume those patterns previously handled by the "extract" contrib (Solr Cell / Tika).
|
||||
|
||||
|
|
|
@ -1117,21 +1117,39 @@ An optional parameter used to determine which of several query implementations s
|
|||
----
|
||||
|
||||
== XCJF Query Parser
|
||||
The Cross Collection Join filter is a query parser plugin that will execute a query against a remote Solr collection to get back a set of join keys that will be used to as a filter query against the local Solr collection. The XCJF query parser will create an XCJFQuery object. The XCJFQuery will first query a remote solr collection and get back a streaming expression result of the join keys. As the join keys are streamed to the node, a bitset of the matching documents in the local index is built up. This avoids keeping the full set of join keys in memory at any given time. This bitset is then inserted into the filter cache upon successful execution as with the normal behavior of the solr filter cache.
|
||||
The Cross Collection Join Filter (XCJF) is a query parser plugin that will execute a query against a remote Solr collection to get back a set of join keys that will be used to as a filter query against the local Solr collection.
|
||||
|
||||
If the local index is sharded according to the join key field, the XCJF query can leverage a secondary query parser called the "hash_range" query parser. The hash_range query parser is responsible for returning only the documents that hash to a given range of values. This allows the XCJFQuery to query the remote solr collection and return only the join keys that would match a specific shard in the local solr collection. This has the benefit of making sure that network traffic doesn't increase as the number of shards increases and allows for much greater scalability.
|
||||
The XCJF parser will create an XCJFQuery object.
|
||||
The XCJFQuery will first query a remote Solr collection and get back a streaming expression result of the join keys.
|
||||
As the join keys are streamed to the node, a bitset of the matching documents in the local index is built up.
|
||||
This avoids keeping the full set of join keys in memory at any given time.
|
||||
This bitset is then inserted into the filter cache upon successful execution as with the normal behavior of the Solr filter cache.
|
||||
|
||||
XCJF parser works with both String and Point types of fields. The fields that are being used for the join key must be single value and have docValues enabled. It's advised to shard the local collection by the join key as this allows for the optimization mentioned above to be utilized. The XCJF should not be generally used as part of the "q", but rather it is designed to be used as a filter query "fq" parameter to ensure proper caching. The remote solr collection that is being queried should have a single value field for the join key with docValues enabled. The remote solr collection does not have any specific sharding requirements.
|
||||
If the local index is sharded according to the join key field, the XCJF parser can leverage a secondary query parser called the "hash_range" query parser.
|
||||
The hash_range query parser is responsible for returning only the documents that hash to a given range of values.
|
||||
This allows the XCJFQuery to query the remote Solr collection and return only the join keys that would match a specific shard in the local Solr collection.
|
||||
This has the benefit of making sure that network traffic doesn't increase as the number of shards increases and allows for much greater scalability.
|
||||
|
||||
=== XCJF Query Parser definition in solrconfig.xml
|
||||
The XCJF parser works with both String and Point types of fields.
|
||||
The fields that are being used for the join key must be single-valued and have docValues enabled.
|
||||
|
||||
The XCJF has some configuration options that can be specified in the solrconfig.xml
|
||||
It's advised to shard the local collection by the join key as this allows for the optimization mentioned above to be utilized.
|
||||
|
||||
The XCJF parser should not be generally used as part of the `q` parameter, but rather it is designed to be used as a filter query (`fq` parameter) to ensure proper caching.
|
||||
|
||||
The remote Solr collection that is being queried should have a single-valued field for the join key with docValues enabled.
|
||||
|
||||
The remote Solr collection does not have any specific sharding requirements.
|
||||
|
||||
=== XCJF Query Parser Definition in solrconfig.xml
|
||||
|
||||
The XCJF has some configuration options that can be specified in `solrconfig.xml`.
|
||||
|
||||
`routerField`::
|
||||
If the documents are routed to shards using the CompositeID router by the join field, then that field name should be specified in the configuration here. This will allow the parser to optimize the resulting HashRange query.
|
||||
|
||||
`solrUrl`::
|
||||
If specified, this array of strings specifies the white listed Solr URLs that you can pass to the solrUrl query parameter. Without this configuration the solrUrl parameter cannot be used. This restriction is necessary to prevent an attacker from using solr to explore the network.
|
||||
If specified, this array of strings specifies the white listed Solr URLs that you can pass to the solrUrl query parameter. Without this configuration the solrUrl parameter cannot be used. This restriction is necessary to prevent an attacker from using Solr to explore the network.
|
||||
|
||||
[source,xml]
|
||||
----
|
||||
|
@ -1148,31 +1166,36 @@ If specified, this array of strings specifies the white listed Solr URLs that yo
|
|||
=== XCJF Query Parameters
|
||||
|
||||
`collection`::
|
||||
The name of the external Solr collection to be queried to retrieve the set of join key values ( required )
|
||||
The name of the external Solr collection to be queried to retrieve the set of join key values (required).
|
||||
|
||||
`zkHost`::
|
||||
The connection string to be used to connect to Zookeeper. zkHost and solrUrl are both optional parameters, and at most one of them should be specified. If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will be used. ( optional )
|
||||
The connection string to be used to connect to ZooKeeper. `zkHost` and `solrUrl` are both optional parameters, and at most one of them should be specified. If neither `zkHost` nor `solrUrl` are specified, the local ZooKeeper cluster will be used. (optional).
|
||||
|
||||
`solrUrl`::
|
||||
The URL of the external Solr node to be queried. Must be a character for character exact match of a whitelisted url. ( optional, disabled by default for security )
|
||||
The URL of the external Solr node to be queried. Must be a character for character exact match of a whitelisted url. (optional, disabled by default for security).
|
||||
|
||||
`from`::
|
||||
The join key field name in the external collection ( required )
|
||||
The join key field name in the external collection (required).
|
||||
|
||||
`to`::
|
||||
The join key field name in the local collection
|
||||
The join key field name in the local collection.
|
||||
|
||||
`v`::
|
||||
The query substituted in as a local param. This is the query string that will match documents in the remote collection.
|
||||
|
||||
`routed`::
|
||||
true / false. If true, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard. This parameter improves the performance of the cross-collection join, but it depends on the local collection being routed by the toField. If this parameter is not specified, the XCJF query will try to determine the correct value automatically.
|
||||
If `true`, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard.
|
||||
This parameter improves the performance of the cross-collection join, but it depends on the local collection being routed by the `to` field.
|
||||
If this parameter is not specified, the XCJF query will try to determine the correct value automatically.
|
||||
|
||||
`ttl`::
|
||||
The length of time that an XCJF query in the cache will be considered valid, in seconds. Defaults to 3600 (one hour). The XCJF query will not be aware of changes to the remote collection, so if the remote collection is updated, cached XCJF queries may give inaccurate results. After the ttl period has expired, the XCJF query will re-execute the join against the remote collection.
|
||||
The length of time that an XCJF query in the cache will be considered valid, in seconds.
|
||||
Defaults to `3600` (one hour).
|
||||
The XCJF query will not be aware of changes to the remote collection, so if the remote collection is updated, cached XCJF queries may give inaccurate results.
|
||||
After the `ttl` period has expired, the XCJF query will re-execute the join against the remote collection.
|
||||
|
||||
`All others`
|
||||
Any normal Solr parameter can also be specified/passed through as a local param.
|
||||
Other Parameters::
|
||||
Any normal Solr query parameter can also be specified/passed through as a local param.
|
||||
|
||||
=== XCJF Query Examples
|
||||
|
||||
|
|
|
@ -196,7 +196,7 @@ The parameter sets can be used directly in a request handler definition as follo
|
|||
To summarize, parameters are applied in this order:
|
||||
|
||||
* parameters defined in `<invariants>` in `solrconfig.xml`.
|
||||
* parameters applied in `invariants` in `params.json` and that is specified in the requesthandler definition or even in request
|
||||
* parameters applied in `invariants` in `params.json` and are specified in the request handler definition or even in a single request.
|
||||
* parameters defined in the request directly.
|
||||
* parameter sets defined in the request, in the order they have been listed with `useParams`.
|
||||
* parameter sets defined in `params.json` that have been defined in the request handler.
|
||||
|
|
|
@ -37,7 +37,7 @@ Prefer to put resources here.
|
|||
== Resources in Other Places
|
||||
|
||||
Resources can also be placed in an arbitrary directory and <<libs.adoc#lib-directives-in-solrconfig,referenced>> from a `<lib />` directive in `solrconfig.xml`, provided the directive refers to a directory and not the actual resource file. Example: `<lib path="/volume/models/" />`
|
||||
This choice may make sense if the resource is too large for a configSet in ZooKeeper.
|
||||
This choice may make sense if the resource is too large for a configset in ZooKeeper.
|
||||
However it's up to you to somehow ensure that all nodes in your cluster have access to these resources.
|
||||
|
||||
Finally, and this is very unusual, resources can also be packaged inside `.jar` files from which they will be referenced.
|
||||
|
|
|
@ -22,7 +22,7 @@ supported out of the box.
|
|||
|
||||
A sampled distributed tracing query request on Jaeger looks like this:
|
||||
|
||||
.Tracing of a solr query
|
||||
.Tracing of a Solr query
|
||||
image::images/solr-tracing/query-request-tracing.png[image,width=600]
|
||||
|
||||
== Setup Tracer
|
||||
|
|
|
@ -128,8 +128,8 @@ on a convenient organization of the index, and should only be considered if norm
|
|||
Streaming Expressions respect the <<distributed-requests.adoc#shards-preference-parameter,shards.preference parameter>> for any call to Solr.
|
||||
|
||||
The value of `shards.preference` that is used to route requests is determined in the following order. The first option available is used.
|
||||
- Provided as a parameter in the streaming expression (e.g. `search(...., shards.preference="replica.type:PULL")`)
|
||||
- Provided in the URL Params of the streaming expression (e.g. `http://solr_url:8983/solr/stream?expr=....&shards.preference=replica.type:PULL`)
|
||||
- Provided as a parameter in the streaming expression (e.g., `search(...., shards.preference="replica.type:PULL")`)
|
||||
- Provided in the URL Params of the streaming expression (e.g., `http://solr_url:8983/solr/stream?expr=....&shards.preference=replica.type:PULL`)
|
||||
- Set as a default in the Cluster properties.
|
||||
|
||||
=== Adding Custom Expressions
|
||||
|
|
|
@ -153,7 +153,7 @@ To query for a field existing, simply use a wildcard instead of a term in the se
|
|||
|
||||
`field:*`
|
||||
|
||||
A field will be considered to "exist" if it has any value, even values which are often considered "not existent". (e.g. `NaN`, `""`, etc.)
|
||||
A field will be considered to "exist" if it has any value, even values which are often considered "not existent". (e.g., `NaN`, `""`, etc.)
|
||||
|
||||
=== Range Searches
|
||||
|
||||
|
@ -354,7 +354,7 @@ Solr's standard query parser originated as a variation of Lucene's "classic" Que
|
|||
** `field:[* TO 100]` finds all field values less than or equal to 100
|
||||
** `field:[100 TO *]` finds all field values greater than or equal to 100
|
||||
** `field:[* TO *]` finds all documents where the field has a value between `-Infinity` and `Infinity`, excluding `NaN`.
|
||||
** `field:*` finds all documents where the field exists (i.e. has any value).
|
||||
** `field:*` finds all documents where the field exists (i.e., has any value).
|
||||
* Pure negative queries (all clauses prohibited) are allowed (only as a top-level clause)
|
||||
** `-inStock:false` finds all field values where inStock is not false
|
||||
** `-field:*` finds all documents without a value for the field.
|
||||
|
|
Loading…
Reference in New Issue