SOLR-12684: put expression names and params in monospace

This commit is contained in:
Cassandra Targett 2018-09-05 20:35:37 -05:00
parent 1a006556e5
commit 9c364b2d86
1 changed files with 12 additions and 11 deletions

View File

@ -970,18 +970,20 @@ outerHashJoin(
The `parallel` function wraps a streaming expression and sends it to N worker nodes to be processed in parallel. The `parallel` function wraps a streaming expression and sends it to N worker nodes to be processed in parallel.
The parallel function requires that the `partitionKeys` parameter be provided to the underlying searches. The `partitionKeys` parameter will partition the search results (tuples) across the worker nodes. Tuples with the same values in the partitionKeys field will be shuffled to the same worker nodes. The `parallel` function requires that the `partitionKeys` parameter be provided to the underlying searches. The `partitionKeys` parameter will partition the search results (tuples) across the worker nodes. Tuples with the same values in for `partitionKeys` will be shuffled to the same worker nodes.
The parallel function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria of the parallel function must incorporate the sort order of the tuples returned by the workers. The `parallel` function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria must incorporate the sort order of the tuples returned by the workers.
For example if you sort on year, month and day you could partition on year only as long as there was enough different years to spread the tuples around the worker nodes. For example if you sort on year, month and day you could partition on year only as long as there are enough different years to spread the tuples around the worker nodes.
Solr allows sorting on more than 4 fields, but you cannot specify more than 4 partitionKeys for speed tradeoffs. Also it's an overkill to specify many partitionKeys when we one or two keys could be enough to spread the tuples.
Parallel Stream was designed when the underlying search stream will emit a lot of tuples from the collection. If the search stream only emits a small subset of the data from the collection using parallel could potentially be slower. Solr allows sorting on more than 4 fields, but you cannot specify more than 4 partitionKeys for speed considerations. Also it's overkill to specify many `partitionKeys` when we one or two keys could be enough to spread the tuples.
Parallel stream was designed when the underlying search stream will emit a lot of tuples from the collection. If the search stream only emits a small subset of the data from the collection using `parallel` could potentially be slower.
.Worker Collections .Worker Collections
[TIP] [TIP]
==== ====
The worker nodes can be from the same collection as the data, or they can be a different collection entirely, even one that only exists for parallel streaming expressions. A worker collection can be any SolrCloud collection that has the `/stream` handler configured. Unlike normal SolrCloud collections, worker collections don't have to hold any data. Worker collections can be empty collections that exist only to execute streaming expressions. The worker nodes can be from the same collection as the data, or they can be a different collection entirely, even one that only exists for `parallel` streaming expressions. A worker collection can be any SolrCloud collection that has the `/stream` handler configured. Unlike normal SolrCloud collections, worker collections don't have to hold any data. Worker collections can be empty collections that exist only to execute streaming expressions.
==== ====
=== parallel Parameters === parallel Parameters
@ -1009,11 +1011,11 @@ The expression above shows a `parallel` function wrapping a `reduce` function. T
.Warmup .Warmup
[TIP] [TIP]
==== ====
The parallel stream uses the hash query parser to split the data amongst the workers. It executes on all the documents and the result bitset is cached in the filterCache. The `parallel` function uses the hash query parser to split the data amongst the workers. It executes on all the documents and the result bitset is cached in the filterCache.
For a parallel stream with the same number of workers and partitonKeys the first query would be slower than subsequent queries. +
For a `parallel` stream with the same number of workers and `partitonKeys` the first query would be slower than subsequent queries.
A trick to not pay the penalty for the first slow query would be to use a warmup query for every new searcher. A trick to not pay the penalty for the first slow query would be to use a warmup query for every new searcher.
The following is a solrconfig.xml snippet for 2 workers and "year_i" as the partionKeys. The following is a `solrconfig.xml` snippet for 2 workers and "year_i" as the `partionKeys`.
[source,text] [source,text]
---- ----
@ -1024,7 +1026,6 @@ The following is a solrconfig.xml snippet for 2 workers and "year_i" as the part
</arr> </arr>
</listener> </listener>
---- ----
==== ====
== priority == priority