SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Koji Sekiguchi 2011-04-24 11:48:43 +00:00
parent 8be8081a49
commit 6a85962022
8 changed files with 180 additions and 185 deletions

View File

@ -21,11 +21,25 @@ $Id$
================== 3.2.0-dev ================== ================== 3.2.0-dev ==================
Test Cases: Upgrading from Solr 3.1
----------------------
* SOLR-2387: add mock annotators for improved testing, * <uimaConfig/> just beneath <config> ... </config> is no longer supported.
It should move to UIMAUpdateRequestProcessorFactory setting.
See contrib/uima/README.txt for more details. (SOLR-2436)
Test Cases:
----------------------
* SOLR-2387: add mock annotators for improved testing,
(Tommaso Teofili via rmuir) (Tommaso Teofili via rmuir)
================== 3.1.0-dev ================== Other Changes
----------------------
* SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml.
(Tommaso Teofili, koji)
================== 3.1.0 ==================
Initial Release Initial Release

View File

@ -3,38 +3,61 @@ Getting Started
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:
1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory. 1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory.
or set <lib/> tags in solrconfig.xml appropriately to point those jar files.
<lib dir="../../contrib/uima/lib" />
<lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />
2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options: 2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options:
3. for example you could specify the following for example you could specify the following
<field name="language" type="string" indexed="true" stored="true" required="false"/> <field name="language" type="string" indexed="true" stored="true" required="false"/>
<field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/> <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
<field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" /> <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />
4. modify your solrconfig.xml adding the following snippet: 3. modify your solrconfig.xml adding the following snippet:
<uimaConfig>
<runtimeParameters> <updateRequestProcessorChain name="uima">
<keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey> <lst name="uimaConfig">
<lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey> <lst name="runtimeParameters">
<cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey> <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<entities_apikey>VALID_ALCHEMYAPI_KEY</entities_apikey> <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID> <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
</runtimeParameters> <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine> <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<analyzeFields merge="false">text</analyzeFields> <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
<fieldMapping> </lst>
<type name="org.apache.uima.alchemy.ts.concept.ConceptFS"> <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<map feature="text" field="concept"/> <lst name="analyzeFields">
</type> <bool name="merge">false</bool>
<type name="org.apache.uima.alchemy.ts.language.LanguageFS"> <arr name="fields">
<map feature="language" field="language"/> <str>text</str>
</type> </arr>
<type name="org.apache.uima.SentenceAnnotation"> </lst>
<map feature="coveredText" field="sentence"/> <lst name="fieldMappings">
</type> <lst name="mapping">
</fieldMapping> <str name="type">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
</uimaConfig> <str name="feature">text</str>
<str name="field">concept</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.uima.alchemy.ts.language.LanguageFS</str>
<str name="feature">language</str>
<str name="field">language</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.uima.SentenceAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access
key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html
@ -42,21 +65,14 @@ To start using Solr UIMA Metadata Extraction Library you should go through the f
where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service
key to exploit the Calais services: http://www.opencalais.com/apikey key to exploit the Calais services: http://www.opencalais.com/apikey
5. the analysisEngine tag must contain an AE descriptor inside the specified path in the classpath the analysisEngine must contain an AE descriptor inside the specified path in the classpath
6. the analyzeFields tag must contain the input fields that need to be analyzed by UIMA, the analyzeFields must contain the input fields that need to be analyzed by UIMA,
if merge=true then their content will be merged and analyzed only once if merge=true then their content will be merged and analyzed only once
7. field mapping describes which features of which types should go in a field field mapping describes which features of which types should go in a field
8. define in your solrconfig.xml an UpdateRequestProcessorChain as following: 4. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following:
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"/>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
9. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults"> <lst name="defaults">
<str name="update.processor">uima</str> <str name="update.processor">uima</str>

View File

@ -65,5 +65,4 @@ public class SolrUIMAConfiguration {
public Map<String, Object> getRuntimeParameters() { public Map<String, Object> getRuntimeParameters() {
return runtimeParameters; return runtimeParameters;
} }
} }

View File

@ -18,11 +18,10 @@ package org.apache.solr.uima.processor;
*/ */
import java.util.HashMap; import java.util.HashMap;
import java.util.List;
import java.util.Map; import java.util.Map;
import org.apache.solr.core.SolrConfig; import org.apache.solr.common.util.NamedList;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
/** /**
* Read configuration for Solr-UIMA integration * Read configuration for Solr-UIMA integration
@ -32,18 +31,10 @@ import org.w3c.dom.NodeList;
*/ */
public class SolrUIMAConfigurationReader { public class SolrUIMAConfigurationReader {
private static final String AE_RUNTIME_PARAMETERS_NODE_PATH = "/config/uimaConfig/runtimeParameters"; private NamedList<Object> args;
private static final String FIELD_MAPPING_NODE_PATH = "/config/uimaConfig/fieldMapping"; public SolrUIMAConfigurationReader(NamedList<Object> args) {
this.args = args;
private static final String ANALYZE_FIELDS_NODE_PATH = "/config/uimaConfig/analyzeFields";
private static final String ANALYSIS_ENGINE_NODE_PATH = "/config/uimaConfig/analysisEngine";
private SolrConfig solrConfig;
public SolrUIMAConfigurationReader(SolrConfig solrConfig) {
this.solrConfig = solrConfig;
} }
public SolrUIMAConfiguration readSolrUIMAConfiguration() { public SolrUIMAConfiguration readSolrUIMAConfiguration() {
@ -52,73 +43,51 @@ public class SolrUIMAConfigurationReader {
} }
private String readAEPath() { private String readAEPath() {
return solrConfig.getNode(ANALYSIS_ENGINE_NODE_PATH, true).getTextContent(); return (String) args.get("analysisEngine");
} }
@SuppressWarnings("rawtypes")
private NamedList getAnalyzeFields() {
return (NamedList) args.get("analyzeFields");
}
@SuppressWarnings("unchecked")
private String[] readFieldsToAnalyze() { private String[] readFieldsToAnalyze() {
Node analyzeFieldsNode = solrConfig.getNode(ANALYZE_FIELDS_NODE_PATH, true); List<String> fields = (List<String>) getAnalyzeFields().get("fields");
return analyzeFieldsNode.getTextContent().split(","); return fields.toArray(new String[fields.size()]);
} }
private boolean readFieldsMerging() { private boolean readFieldsMerging() {
Node analyzeFieldsNode = solrConfig.getNode(ANALYZE_FIELDS_NODE_PATH, true); return (Boolean) getAnalyzeFields().get("merge");
Node mergeNode = analyzeFieldsNode.getAttributes().getNamedItem("merge");
return Boolean.valueOf(mergeNode.getNodeValue());
} }
@SuppressWarnings("rawtypes")
private Map<String, Map<String, String>> readTypesFeaturesFieldsMapping() { private Map<String, Map<String, String>> readTypesFeaturesFieldsMapping() {
Map<String, Map<String, String>> map = new HashMap<String, Map<String, String>>(); Map<String, Map<String, String>> map = new HashMap<String, Map<String, String>>();
Node fieldMappingNode = solrConfig.getNode(FIELD_MAPPING_NODE_PATH, true); NamedList fieldMappings = (NamedList) args.get("fieldMappings");
/* iterate over UIMA types */ /* iterate over UIMA types */
if (fieldMappingNode.hasChildNodes()) { for (int i = 0; i < fieldMappings.size(); i++) {
NodeList typeNodes = fieldMappingNode.getChildNodes(); NamedList mapping = (NamedList) fieldMappings.get("mapping", i);
for (int i = 0; i < typeNodes.getLength(); i++) { String typeName = (String) mapping.get("type");
/* <type> node */ String featureName = (String) mapping.get("feature");
Node typeNode = typeNodes.item(i); String mappedFieldName = (String) mapping.get("field");
if (typeNode.getNodeType() != Node.TEXT_NODE) { Map<String, String> subMap = new HashMap<String, String>();
Node typeNameAttribute = typeNode.getAttributes().getNamedItem("name"); subMap.put(featureName, mappedFieldName);
/* get a UIMA typename */ map.put(typeName, subMap);
String typeName = typeNameAttribute.getNodeValue();
/* create entry for UIMA type */
map.put(typeName, new HashMap<String, String>());
if (typeNode.hasChildNodes()) {
/* iterate over features */
NodeList featuresNodeList = typeNode.getChildNodes();
for (int j = 0; j < featuresNodeList.getLength(); j++) {
Node mappingNode = featuresNodeList.item(j);
if (mappingNode.getNodeType() != Node.TEXT_NODE) {
/* get field name */
Node fieldNameNode = mappingNode.getAttributes().getNamedItem("field");
String mappedFieldName = fieldNameNode.getNodeValue();
/* get feature name */
Node featureNameNode = mappingNode.getAttributes().getNamedItem("feature");
String featureName = featureNameNode.getNodeValue();
/* map the feature to the field for the specified type */
map.get(typeName).put(featureName, mappedFieldName);
}
}
}
}
}
} }
return map; return map;
} }
@SuppressWarnings("rawtypes")
private Map<String, Object> readAEOverridingParameters() { private Map<String, Object> readAEOverridingParameters() {
Map<String, Object> runtimeParameters = new HashMap<String, Object>(); Map<String, Object> runtimeParameters = new HashMap<String, Object>();
Node uimaConfigNode = solrConfig.getNode(AE_RUNTIME_PARAMETERS_NODE_PATH, true); NamedList runtimeParams = (NamedList) args.get("runtimeParameters");
for (int i = 0; i < runtimeParams.size(); i++) {
if (uimaConfigNode.hasChildNodes()) { String name = runtimeParams.getName(i);
NodeList overridingNodes = uimaConfigNode.getChildNodes(); Object value = runtimeParams.getVal(i);
for (int i = 0; i < overridingNodes.getLength(); i++) { runtimeParameters.put(name, value);
Node overridingNode = overridingNodes.item(i);
if (overridingNode.getNodeType() != Node.TEXT_NODE && overridingNode.getNodeType() != Node.COMMENT_NODE) {
runtimeParameters.put(overridingNode.getNodeName(), overridingNode.getTextContent());
} }
}
}
return runtimeParameters; return runtimeParameters;
} }

View File

@ -43,15 +43,14 @@ public class UIMAUpdateRequestProcessor extends UpdateRequestProcessor {
private AEProvider aeProvider; private AEProvider aeProvider;
public UIMAUpdateRequestProcessor(UpdateRequestProcessor next, SolrCore solrCore) { public UIMAUpdateRequestProcessor(UpdateRequestProcessor next, SolrCore solrCore,
SolrUIMAConfiguration config) {
super(next); super(next);
initialize(solrCore); initialize(solrCore, config);
} }
private void initialize(SolrCore solrCore) { private void initialize(SolrCore solrCore, SolrUIMAConfiguration config) {
SolrUIMAConfigurationReader uimaConfigurationReader = new SolrUIMAConfigurationReader(solrCore solrUIMAConfiguration = config;
.getSolrConfig());
solrUIMAConfiguration = uimaConfigurationReader.readSolrUIMAConfiguration();
aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(), aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(),
solrUIMAConfiguration.getAePath(), solrUIMAConfiguration.getRuntimeParameters()); solrUIMAConfiguration.getAePath(), solrUIMAConfiguration.getRuntimeParameters());
} }

View File

@ -17,6 +17,7 @@ package org.apache.solr.uima.processor;
* limitations under the License. * limitations under the License.
*/ */
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.response.SolrQueryResponse;
import org.apache.solr.update.processor.UpdateRequestProcessor; import org.apache.solr.update.processor.UpdateRequestProcessor;
@ -29,10 +30,19 @@ import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
*/ */
public class UIMAUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory { public class UIMAUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory {
private NamedList<Object> args;
@SuppressWarnings("unchecked")
@Override
public void init(@SuppressWarnings("rawtypes") NamedList args) {
this.args = (NamedList<Object>) args.get("uimaConfig");
}
@Override @Override
public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp,
UpdateRequestProcessor next) { UpdateRequestProcessor next) {
return new UIMAUpdateRequestProcessor(next, req.getCore()); return new UIMAUpdateRequestProcessor(next, req.getCore(),
new SolrUIMAConfigurationReader(args).readSolrUIMAConfiguration());
} }
} }

View File

@ -15,19 +15,34 @@
limitations under the License. limitations under the License.
--> -->
<uimaConfig> <updateRequestProcessorChain name="uima">
<runtimeParameters> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey> <lst name="uimaConfig">
<concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey> <lst name="runtimeParameters">
<lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey> <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey> <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID> <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
</runtimeParameters> <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine> <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<analyzeFields merge="false">text,title</analyzeFields> <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
<fieldMapping> </lst>
<type name="org.apache.uima.jcas.tcas.Annotation"> <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<map feature="coveredText" field="tag"/> <lst name="analyzeFields">
</type> <bool name="merge">false</bool>
</fieldMapping> <arr name="fields">
</uimaConfig> <str>text</str>
<str>title</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="mapping">
<str name="type">org.apache.uima.jcas.tcas.Annotation</str>
<str name="feature">convertText</str>
<str name="field">tag</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

View File

@ -953,42 +953,6 @@
</lst> </lst>
</requestHandler> </requestHandler>
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!--
A regular-expression-based fragmenter (f.i., for sentence
extraction)
-->
<fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
</highlighting>
<!-- <!--
An example dedup update processor that creates the "id" field on the An example dedup update processor that creates the "id" field on the
fly based on the hash code of some other fields. This example has fly based on the hash code of some other fields. This example has
@ -1001,13 +965,41 @@
--> -->
<updateRequestProcessorChain name="uima"> <updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"/> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<int name="ngramsize">3</int>
</lst>
<str name="analysisEngine">/TestAE.xml</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="mapping">
<str name="type">org.apache.uima.SentenceAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.solr.uima.ts.SentimentAnnotation</str>
<str name="feature">mood</str>
<str name="field">sentiment</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.solr.uima.ts.EntityAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">entity</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain> </updateRequestProcessorChain>
<!-- <!--
queryResponseWriter plugins... query responses will be written using queryResponseWriter plugins... query responses will be written using
the writer specified by the 'wt' request parameter matching the name the writer specified by the 'wt' request parameter matching the name
@ -1062,23 +1054,4 @@
--> -->
</admin> </admin>
<uimaConfig>
<runtimeParameters>
<ngramsize>3</ngramsize>
</runtimeParameters>
<analysisEngine>/TestAE.xml</analysisEngine>
<analyzeFields merge="false">text</analyzeFields>
<fieldMapping>
<type name="org.apache.uima.SentenceAnnotation">
<map feature="coveredText" field="sentence"/>
</type>
<type name="org.apache.solr.uima.ts.SentimentAnnotation">
<map feature="mood" field="sentiment"/>
</type>
<type name="org.apache.solr.uima.ts.EntityAnnotation">
<map feature="coveredText" field="entity"/>
</type>
</fieldMapping>
</uimaConfig>
</config> </config>