SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Koji Sekiguchi 2011-04-24 11:48:43 +00:00
parent 8be8081a49
commit 6a85962022
8 changed files with 180 additions and 185 deletions

View File

@ -21,11 +21,25 @@ $Id$
================== 3.2.0-dev ==================
Upgrading from Solr 3.1
----------------------
* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
It should move to UIMAUpdateRequestProcessorFactory setting.
See contrib/uima/README.txt for more details. (SOLR-2436)
Test Cases:
----------------------
* SOLR-2387: add mock annotators for improved testing,
(Tommaso Teofili via rmuir)
* SOLR-2387: add mock annotators for improved testing,
(Tommaso Teofili via rmuir)
================== 3.1.0-dev ==================
Other Changes
----------------------
* SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml.
(Tommaso Teofili, koji)
================== 3.1.0 ==================
Initial Release

View File

@ -3,38 +3,61 @@ Getting Started
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:
1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory.
or set <lib/> tags in solrconfig.xml appropriately to point those jar files.
<lib dir="../../contrib/uima/lib" />
<lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />
2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options:
3. for example you could specify the following
for example you could specify the following
<field name="language" type="string" indexed="true" stored="true" required="false"/>
<field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
<field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />
4. modify your solrconfig.xml adding the following snippet:
<uimaConfig>
<runtimeParameters>
<keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
<concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
<lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
<cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
<entities_apikey>VALID_ALCHEMYAPI_KEY</entities_apikey>
<oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
</runtimeParameters>
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine>
<analyzeFields merge="false">text</analyzeFields>
<fieldMapping>
<type name="org.apache.uima.alchemy.ts.concept.ConceptFS">
<map feature="text" field="concept"/>
</type>
<type name="org.apache.uima.alchemy.ts.language.LanguageFS">
<map feature="language" field="language"/>
</type>
<type name="org.apache.uima.SentenceAnnotation">
<map feature="coveredText" field="sentence"/>
</type>
</fieldMapping>
</uimaConfig>
3. modify your solrconfig.xml adding the following snippet:
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
</lst>
<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="mapping">
<str name="type">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
<str name="feature">text</str>
<str name="field">concept</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.uima.alchemy.ts.language.LanguageFS</str>
<str name="feature">language</str>
<str name="field">language</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.uima.SentenceAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access
key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html
@ -42,21 +65,14 @@ To start using Solr UIMA Metadata Extraction Library you should go through the f
where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service
key to exploit the Calais services: http://www.opencalais.com/apikey
5. the analysisEngine tag must contain an AE descriptor inside the specified path in the classpath
the analysisEngine must contain an AE descriptor inside the specified path in the classpath
6. the analyzeFields tag must contain the input fields that need to be analyzed by UIMA,
the analyzeFields must contain the input fields that need to be analyzed by UIMA,
if merge=true then their content will be merged and analyzed only once
7. field mapping describes which features of which types should go in a field
field mapping describes which features of which types should go in a field
8. define in your solrconfig.xml an UpdateRequestProcessorChain as following:
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"/>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
9. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following:
4. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>

View File

@ -21,7 +21,7 @@ import java.util.Map;
/**
* Configuration holding all the configurable parameters for calling UIMA inside Solr
*
*
* @version $Id$
*/
public class SolrUIMAConfiguration {
@ -65,5 +65,4 @@ public class SolrUIMAConfiguration {
public Map<String, Object> getRuntimeParameters() {
return runtimeParameters;
}
}

View File

@ -18,11 +18,10 @@ package org.apache.solr.uima.processor;
*/
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.solr.core.SolrConfig;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.apache.solr.common.util.NamedList;
/**
* Read configuration for Solr-UIMA integration
@ -32,18 +31,10 @@ import org.w3c.dom.NodeList;
*/
public class SolrUIMAConfigurationReader {
private static final String AE_RUNTIME_PARAMETERS_NODE_PATH = "/config/uimaConfig/runtimeParameters";
private NamedList<Object> args;
private static final String FIELD_MAPPING_NODE_PATH = "/config/uimaConfig/fieldMapping";
private static final String ANALYZE_FIELDS_NODE_PATH = "/config/uimaConfig/analyzeFields";
private static final String ANALYSIS_ENGINE_NODE_PATH = "/config/uimaConfig/analysisEngine";
private SolrConfig solrConfig;
public SolrUIMAConfigurationReader(SolrConfig solrConfig) {
this.solrConfig = solrConfig;
public SolrUIMAConfigurationReader(NamedList<Object> args) {
this.args = args;
}
public SolrUIMAConfiguration readSolrUIMAConfiguration() {
@ -52,73 +43,51 @@ public class SolrUIMAConfigurationReader {
}
private String readAEPath() {
return solrConfig.getNode(ANALYSIS_ENGINE_NODE_PATH, true).getTextContent();
return (String) args.get("analysisEngine");
}
@SuppressWarnings("rawtypes")
private NamedList getAnalyzeFields() {
return (NamedList) args.get("analyzeFields");
}
@SuppressWarnings("unchecked")
private String[] readFieldsToAnalyze() {
Node analyzeFieldsNode = solrConfig.getNode(ANALYZE_FIELDS_NODE_PATH, true);
return analyzeFieldsNode.getTextContent().split(",");
List<String> fields = (List<String>) getAnalyzeFields().get("fields");
return fields.toArray(new String[fields.size()]);
}
private boolean readFieldsMerging() {
Node analyzeFieldsNode = solrConfig.getNode(ANALYZE_FIELDS_NODE_PATH, true);
Node mergeNode = analyzeFieldsNode.getAttributes().getNamedItem("merge");
return Boolean.valueOf(mergeNode.getNodeValue());
return (Boolean) getAnalyzeFields().get("merge");
}
@SuppressWarnings("rawtypes")
private Map<String, Map<String, String>> readTypesFeaturesFieldsMapping() {
Map<String, Map<String, String>> map = new HashMap<String, Map<String, String>>();
Node fieldMappingNode = solrConfig.getNode(FIELD_MAPPING_NODE_PATH, true);
NamedList fieldMappings = (NamedList) args.get("fieldMappings");
/* iterate over UIMA types */
if (fieldMappingNode.hasChildNodes()) {
NodeList typeNodes = fieldMappingNode.getChildNodes();
for (int i = 0; i < typeNodes.getLength(); i++) {
/* <type> node */
Node typeNode = typeNodes.item(i);
if (typeNode.getNodeType() != Node.TEXT_NODE) {
Node typeNameAttribute = typeNode.getAttributes().getNamedItem("name");
/* get a UIMA typename */
String typeName = typeNameAttribute.getNodeValue();
/* create entry for UIMA type */
map.put(typeName, new HashMap<String, String>());
if (typeNode.hasChildNodes()) {
/* iterate over features */
NodeList featuresNodeList = typeNode.getChildNodes();
for (int j = 0; j < featuresNodeList.getLength(); j++) {
Node mappingNode = featuresNodeList.item(j);
if (mappingNode.getNodeType() != Node.TEXT_NODE) {
/* get field name */
Node fieldNameNode = mappingNode.getAttributes().getNamedItem("field");
String mappedFieldName = fieldNameNode.getNodeValue();
/* get feature name */
Node featureNameNode = mappingNode.getAttributes().getNamedItem("feature");
String featureName = featureNameNode.getNodeValue();
/* map the feature to the field for the specified type */
map.get(typeName).put(featureName, mappedFieldName);
}
}
}
}
}
for (int i = 0; i < fieldMappings.size(); i++) {
NamedList mapping = (NamedList) fieldMappings.get("mapping", i);
String typeName = (String) mapping.get("type");
String featureName = (String) mapping.get("feature");
String mappedFieldName = (String) mapping.get("field");
Map<String, String> subMap = new HashMap<String, String>();
subMap.put(featureName, mappedFieldName);
map.put(typeName, subMap);
}
return map;
}
@SuppressWarnings("rawtypes")
private Map<String, Object> readAEOverridingParameters() {
Map<String, Object> runtimeParameters = new HashMap<String, Object>();
Node uimaConfigNode = solrConfig.getNode(AE_RUNTIME_PARAMETERS_NODE_PATH, true);
if (uimaConfigNode.hasChildNodes()) {
NodeList overridingNodes = uimaConfigNode.getChildNodes();
for (int i = 0; i < overridingNodes.getLength(); i++) {
Node overridingNode = overridingNodes.item(i);
if (overridingNode.getNodeType() != Node.TEXT_NODE && overridingNode.getNodeType() != Node.COMMENT_NODE) {
runtimeParameters.put(overridingNode.getNodeName(), overridingNode.getTextContent());
}
}
NamedList runtimeParams = (NamedList) args.get("runtimeParameters");
for (int i = 0; i < runtimeParams.size(); i++) {
String name = runtimeParams.getName(i);
Object value = runtimeParams.getVal(i);
runtimeParameters.put(name, value);
}
return runtimeParameters;
}

View File

@ -34,7 +34,7 @@ import org.apache.uima.resource.ResourceInitializationException;
/**
* Update document(s) to be indexed with UIMA extracted information
*
*
* @version $Id$
*/
public class UIMAUpdateRequestProcessor extends UpdateRequestProcessor {
@ -43,15 +43,14 @@ public class UIMAUpdateRequestProcessor extends UpdateRequestProcessor {
private AEProvider aeProvider;
public UIMAUpdateRequestProcessor(UpdateRequestProcessor next, SolrCore solrCore) {
public UIMAUpdateRequestProcessor(UpdateRequestProcessor next, SolrCore solrCore,
SolrUIMAConfiguration config) {
super(next);
initialize(solrCore);
initialize(solrCore, config);
}
private void initialize(SolrCore solrCore) {
SolrUIMAConfigurationReader uimaConfigurationReader = new SolrUIMAConfigurationReader(solrCore
.getSolrConfig());
solrUIMAConfiguration = uimaConfigurationReader.readSolrUIMAConfiguration();
private void initialize(SolrCore solrCore, SolrUIMAConfiguration config) {
solrUIMAConfiguration = config;
aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(),
solrUIMAConfiguration.getAePath(), solrUIMAConfiguration.getRuntimeParameters());
}

View File

@ -17,6 +17,7 @@ package org.apache.solr.uima.processor;
* limitations under the License.
*/
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;
import org.apache.solr.update.processor.UpdateRequestProcessor;
@ -29,10 +30,19 @@ import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
*/
public class UIMAUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory {
private NamedList<Object> args;
@SuppressWarnings("unchecked")
@Override
public void init(@SuppressWarnings("rawtypes") NamedList args) {
this.args = (NamedList<Object>) args.get("uimaConfig");
}
@Override
public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp,
UpdateRequestProcessor next) {
return new UIMAUpdateRequestProcessor(next, req.getCore());
return new UIMAUpdateRequestProcessor(next, req.getCore(),
new SolrUIMAConfigurationReader(args).readSolrUIMAConfiguration());
}
}

View File

@ -15,19 +15,34 @@
limitations under the License.
-->
<uimaConfig>
<runtimeParameters>
<keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
<concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
<lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
<cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
<oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
</runtimeParameters>
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine>
<analyzeFields merge="false">text,title</analyzeFields>
<fieldMapping>
<type name="org.apache.uima.jcas.tcas.Annotation">
<map feature="coveredText" field="tag"/>
</type>
</fieldMapping>
</uimaConfig>
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
</lst>
<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
<str>title</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="mapping">
<str name="type">org.apache.uima.jcas.tcas.Annotation</str>
<str name="feature">convertText</str>
<str name="field">tag</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

View File

@ -953,42 +953,6 @@
</lst>
</requestHandler>
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!--
A regular-expression-based fragmenter (f.i., for sentence
extraction)
-->
<fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
</highlighting>
<!--
An example dedup update processor that creates the "id" field on the
fly based on the hash code of some other fields. This example has
@ -1001,13 +965,41 @@
-->
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"/>
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<int name="ngramsize">3</int>
</lst>
<str name="analysisEngine">/TestAE.xml</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="mapping">
<str name="type">org.apache.uima.SentenceAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.solr.uima.ts.SentimentAnnotation</str>
<str name="feature">mood</str>
<str name="field">sentiment</str>
</lst>
<lst name="mapping">
<str name="type">org.apache.solr.uima.ts.EntityAnnotation</str>
<str name="feature">coveredText</str>
<str name="field">entity</str>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!--
queryResponseWriter plugins... query responses will be written using
the writer specified by the 'wt' request parameter matching the name
@ -1062,23 +1054,4 @@
-->
</admin>
<uimaConfig>
<runtimeParameters>
<ngramsize>3</ngramsize>
</runtimeParameters>
<analysisEngine>/TestAE.xml</analysisEngine>
<analyzeFields merge="false">text</analyzeFields>
<fieldMapping>
<type name="org.apache.uima.SentenceAnnotation">
<map feature="coveredText" field="sentence"/>
</type>
<type name="org.apache.solr.uima.ts.SentimentAnnotation">
<map feature="mood" field="sentiment"/>
</type>
<type name="org.apache.solr.uima.ts.EntityAnnotation">
<map feature="coveredText" field="entity"/>
</type>
</fieldMapping>
</uimaConfig>
</config>