From 9c364b2d8640e84a2fe3b7a8d8adfc20d3d53e38 Mon Sep 17 00:00:00 2001 From: Cassandra Targett Date: Wed, 5 Sep 2018 20:35:37 -0500 Subject: [PATCH] SOLR-12684: put expression names and params in monospace --- .../src/stream-decorator-reference.adoc | 23 ++++++++++--------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/solr/solr-ref-guide/src/stream-decorator-reference.adoc b/solr/solr-ref-guide/src/stream-decorator-reference.adoc index b397192d326..08f1e7ae1e2 100644 --- a/solr/solr-ref-guide/src/stream-decorator-reference.adoc +++ b/solr/solr-ref-guide/src/stream-decorator-reference.adoc @@ -970,18 +970,20 @@ outerHashJoin( The `parallel` function wraps a streaming expression and sends it to N worker nodes to be processed in parallel. -The parallel function requires that the `partitionKeys` parameter be provided to the underlying searches. The `partitionKeys` parameter will partition the search results (tuples) across the worker nodes. Tuples with the same values in the partitionKeys field will be shuffled to the same worker nodes. +The `parallel` function requires that the `partitionKeys` parameter be provided to the underlying searches. The `partitionKeys` parameter will partition the search results (tuples) across the worker nodes. Tuples with the same values in for `partitionKeys` will be shuffled to the same worker nodes. -The parallel function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria of the parallel function must incorporate the sort order of the tuples returned by the workers. +The `parallel` function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria must incorporate the sort order of the tuples returned by the workers. -For example if you sort on year, month and day you could partition on year only as long as there was enough different years to spread the tuples around the worker nodes. -Solr allows sorting on more than 4 fields, but you cannot specify more than 4 partitionKeys for speed tradeoffs. Also it's an overkill to specify many partitionKeys when we one or two keys could be enough to spread the tuples. -Parallel Stream was designed when the underlying search stream will emit a lot of tuples from the collection. If the search stream only emits a small subset of the data from the collection using parallel could potentially be slower. +For example if you sort on year, month and day you could partition on year only as long as there are enough different years to spread the tuples around the worker nodes. + +Solr allows sorting on more than 4 fields, but you cannot specify more than 4 partitionKeys for speed considerations. Also it's overkill to specify many `partitionKeys` when we one or two keys could be enough to spread the tuples. + +Parallel stream was designed when the underlying search stream will emit a lot of tuples from the collection. If the search stream only emits a small subset of the data from the collection using `parallel` could potentially be slower. .Worker Collections [TIP] ==== -The worker nodes can be from the same collection as the data, or they can be a different collection entirely, even one that only exists for parallel streaming expressions. A worker collection can be any SolrCloud collection that has the `/stream` handler configured. Unlike normal SolrCloud collections, worker collections don't have to hold any data. Worker collections can be empty collections that exist only to execute streaming expressions. +The worker nodes can be from the same collection as the data, or they can be a different collection entirely, even one that only exists for `parallel` streaming expressions. A worker collection can be any SolrCloud collection that has the `/stream` handler configured. Unlike normal SolrCloud collections, worker collections don't have to hold any data. Worker collections can be empty collections that exist only to execute streaming expressions. ==== === parallel Parameters @@ -1009,11 +1011,11 @@ The expression above shows a `parallel` function wrapping a `reduce` function. T .Warmup [TIP] ==== -The parallel stream uses the hash query parser to split the data amongst the workers. It executes on all the documents and the result bitset is cached in the filterCache. -For a parallel stream with the same number of workers and partitonKeys the first query would be slower than subsequent queries. +The `parallel` function uses the hash query parser to split the data amongst the workers. It executes on all the documents and the result bitset is cached in the filterCache. ++ +For a `parallel` stream with the same number of workers and `partitonKeys` the first query would be slower than subsequent queries. A trick to not pay the penalty for the first slow query would be to use a warmup query for every new searcher. -The following is a solrconfig.xml snippet for 2 workers and "year_i" as the partionKeys. - +The following is a `solrconfig.xml` snippet for 2 workers and "year_i" as the `partionKeys`. [source,text] ---- @@ -1024,7 +1026,6 @@ The following is a solrconfig.xml snippet for 2 workers and "year_i" as the part ---- - ==== == priority