mirror of https://github.com/apache/lucene.git
SOLR-14572 document missing SearchComponents (#1581)
* Add an example explaining how to use * fix up JavaDoc formatting * add missing SearchComponents that ship with Solr, and point to external site with components. * fix path * simplify page layout by consolidating to lists * add missing components that are documented elsewhere in refguide * try to get pathing to pass precommit * remove mention of solr.cool, in favour of a seperate PR that handles it differently
This commit is contained in:
parent
ea0ad3ec51
commit
207efbceeb
|
@ -33,14 +33,70 @@ import org.slf4j.Logger;
|
|||
import org.slf4j.LoggerFactory;
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Update processor which examines a URL and outputs to various other fields
|
||||
* characteristics of that URL, including length, number of path levels, whether
|
||||
* it is a top level URL (levels==0), whether it looks like a landing/index page,
|
||||
* a canonical representation of the URL (e.g. stripping index.html), the domain
|
||||
* and path parts of the URL etc.
|
||||
* </p>
|
||||
*
|
||||
* <p>
|
||||
* This processor is intended used in connection with processing web resources,
|
||||
* and helping to produce values which may be used for boosting or filtering later.
|
||||
* </p>
|
||||
*
|
||||
* <p>
|
||||
* In the example configuration below, we construct a custom
|
||||
* <code>updateRequestProcessorChain</code> and then instruct the
|
||||
* <code>/update</code> requesthandler to use it for every incoming document.
|
||||
* </p>
|
||||
* <pre class="prettyprint">
|
||||
* <updateRequestProcessorChain name="urlProcessor">
|
||||
* <processor class="org.apache.solr.update.processor.URLClassifyProcessorFactory">
|
||||
* <bool name="enabled">true</bool>
|
||||
* <str name="inputField">id</str>
|
||||
* <str name="domainOutputField">hostname</str>
|
||||
* </processor>
|
||||
* <processor class="solr.RunUpdateProcessorFactory" />
|
||||
* </updateRequestProcessorChain>
|
||||
*
|
||||
* <requestHandler name="/update" class="solr.UpdateRequestHandler">
|
||||
* <lst name="defaults">
|
||||
* <str name="update.chain">urlProcessor</str>
|
||||
* </lst>
|
||||
* </requestHandler>
|
||||
* </pre>
|
||||
* <p>
|
||||
* Then, at index time, Solr will look at the <code>id</code> field value and extract
|
||||
* it's domain portion into a new <code>hostname</code> field. By default, the
|
||||
* following fields will also be added:
|
||||
* </p>
|
||||
* <ul>
|
||||
* <li>url_length</li>
|
||||
* <li>url_levels</li>
|
||||
* <li>url_toplevel</li>
|
||||
* <li>url_landingpage</li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* For example, adding the following document
|
||||
* <pre class="prettyprint">
|
||||
* { "id":"http://wwww.mydomain.com/subpath/document.html" }
|
||||
* </pre>
|
||||
* <p>
|
||||
* will result in this document in Solr:
|
||||
* </p>
|
||||
* <pre class="prettyprint">
|
||||
* {
|
||||
* "id":"http://wwww.mydomain.com/subpath/document.html",
|
||||
* "url_length":["46"],
|
||||
* "url_levels":["2"],
|
||||
* "url_toplevel":["0"],
|
||||
* "url_landingpage":["0"],
|
||||
* "hostname":["wwww.mydomain.com"],
|
||||
* "_version_":1603193062117343232}]
|
||||
* }
|
||||
* </pre>
|
||||
*/
|
||||
public class URLClassifyProcessor extends UpdateRequestProcessor {
|
||||
|
||||
|
|
|
@ -169,3 +169,12 @@ Many of the other useful components are described in sections of this Guide for
|
|||
* `TermVectorComponent`, described in the section <<the-term-vector-component.adoc#the-term-vector-component,The Term Vector Component>>.
|
||||
* `QueryElevationComponent`, described in the section <<the-query-elevation-component.adoc#the-query-elevation-component,The Query Elevation Component>>.
|
||||
* `TermsComponent`, described in the section <<the-terms-component.adoc#the-terms-component,The Terms Component>>.
|
||||
* `RealTimeGetComponent`, described in the section <<realtime-get.adoc#realtime-get,RealTime Get>>.
|
||||
* `ClusteringComponent`, described in the section <<result-clustering.adoc#result-clustering,Result Clustering>>.
|
||||
* `SuggestComponent`, described in the section <<suggester.adoc#suggester,Suggester>>.
|
||||
* `AnalyticsComponent`, described in the section <<analytics.adoc#analytics,Analytics>>.
|
||||
|
||||
Other components that ship with Solr include:
|
||||
|
||||
* `ResponseLogComponent`, used to record which documents are returned to the user via the Solr log, described in the {solr-javadocs}solr-core/org/apache/solr/handler/component/ResponseLogComponent.html[ResponseLogComponent] javadocs.
|
||||
* `PhrasesIdentificationComponent`, used to identify & score "phrases" found in the input string, based on shingles in indexed fields, described in the {solr-javadocs}solr-core/org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent] javadocs.
|
||||
|
|
Loading…
Reference in New Issue