lucene/solr/contrib/uima/README.txt

Getting Started
---------------
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:

1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory.
   or set <lib/> tags in solrconfig.xml appropriately to point those jar files.

   <lib dir="../../contrib/uima/lib" />
   <lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />

2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options:

   for example you could specify the following

  <field name="language" type="string" indexed="true" stored="true" required="false"/>
  <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
  <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />

3. modify your solrconfig.xml adding the following snippet:

  <updateRequestProcessorChain name="uima">
    <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
      <lst name="uimaConfig">
        <lst name="runtimeParameters">
          <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
          <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
          <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
          <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
          <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
          <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
        </lst>
        <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
        <!-- Set to true if you want to continue indexing even if text processing fails.
             Default is false. That is, Solr throws RuntimeException and
             never indexed documents entirely in your session. -->
        <bool name="ignoreErrors">true</bool>
        <!-- This is optional. It is used for logging when text processing fails.
             If logField is not specified, uniqueKey will be used as logField.
        <str name="logField">id</str>
        -->
        <lst name="analyzeFields">
          <bool name="merge">false</bool>
          <arr name="fields">
            <str>text</str>
          </arr>
        </lst>
        <lst name="fieldMappings">
          <lst name="type">
            <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
            <lst name="mapping">
              <str name="feature">text</str>
              <str name="field">concept</str>
            </lst>
          </lst>
          <lst name="type">
            <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
            <lst name="mapping">
              <str name="feature">language</str>
              <str name="field">language</str>
            </lst>
          </lst>
          <lst name="type">
            <str name="name">org.apache.uima.SentenceAnnotation</str>
            <lst name="mapping">
              <str name="feature">coveredText</str>
              <str name="field">sentence</str>
            </lst>
          </lst>
        </lst>
      </lst>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

   where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access
   key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html

   where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service
   key to exploit the Calais services: http://www.opencalais.com/apikey
  
   the analysisEngine must contain an AE descriptor inside the specified path in the classpath

   the analyzeFields must contain the input fields that need to be analyzed by UIMA,
   if merge=true then their content will be merged and analyzed only once

   field mapping describes which features of which types should go in a field

4. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...)  or create a new UpdateRequestHandler with the following:
  <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
    <lst name="defaults">
      <str name="update.processor">uima</str>
    </lst>
  </requestHandler>

Once you're done with the configuration you can index documents which will be automatically enriched with the specified fields
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00			`Getting Started`
			`---------------`
			`To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:`

			`1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory.`
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`or set <lib/> tags in solrconfig.xml appropriately to point those jar files.`

			`<lib dir="../../contrib/uima/lib" />`
			`<lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00
			`2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options:`

SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`for example you could specify the following`

SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00			`<field name="language" type="string" indexed="true" stored="true" required="false"/>`
			`<field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/>`
			`<field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />`

SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`3. modify your solrconfig.xml adding the following snippet:`

			`<updateRequestProcessorChain name="uima">`
			`<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">`
			`<lst name="uimaConfig">`
			`<lst name="runtimeParameters">`
			`<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>`
			`<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>`
			`<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>`
			`<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>`
			`<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>`
			`<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>`
			`</lst>`
			`<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>`
SOLR-2512: add ignoreErrors flag so that users can ignore exceptions in AE. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1102785 13f79535-47bb-0310-9956-ffa450edef68 2011-05-13 11:12:53 -04:00			`<!-- Set to true if you want to continue indexing even if text processing fails.`
			`Default is false. That is, Solr throws RuntimeException and`
			`never indexed documents entirely in your session. -->`
			`<bool name="ignoreErrors">true</bool>`
			`<!-- This is optional. It is used for logging when text processing fails.`
SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1135011 13f79535-47bb-0310-9956-ffa450edef68 2011-06-12 22:37:57 -04:00			`If logField is not specified, uniqueKey will be used as logField.`
SOLR-2512: add ignoreErrors flag so that users can ignore exceptions in AE. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1102785 13f79535-47bb-0310-9956-ffa450edef68 2011-05-13 11:12:53 -04:00			`<str name="logField">id</str>`
SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1135011 13f79535-47bb-0310-9956-ffa450edef68 2011-06-12 22:37:57 -04:00			`-->`
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`<lst name="analyzeFields">`
			`<bool name="merge">false</bool>`
			`<arr name="fields">`
			`<str>text</str>`
			`</arr>`
			`</lst>`
			`<lst name="fieldMappings">`
SOLR-2436: forgot to fix mapping definition in README.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1100437 13f79535-47bb-0310-9956-ffa450edef68 2011-05-06 23:36:40 -04:00			`<lst name="type">`
			`<str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>`
			`<lst name="mapping">`
			`<str name="feature">text</str>`
			`<str name="field">concept</str>`
			`</lst>`
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`</lst>`
SOLR-2436: forgot to fix mapping definition in README.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1100437 13f79535-47bb-0310-9956-ffa450edef68 2011-05-06 23:36:40 -04:00			`<lst name="type">`
			`<str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>`
			`<lst name="mapping">`
			`<str name="feature">language</str>`
			`<str name="field">language</str>`
			`</lst>`
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`</lst>`
SOLR-2436: forgot to fix mapping definition in README.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1100437 13f79535-47bb-0310-9956-ffa450edef68 2011-05-06 23:36:40 -04:00			`<lst name="type">`
			`<str name="name">org.apache.uima.SentenceAnnotation</str>`
			`<lst name="mapping">`
			`<str name="feature">coveredText</str>`
			`<str name="field">sentence</str>`
			`</lst>`
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`</lst>`
			`</lst>`
			`</lst>`
			`</processor>`
			`<processor class="solr.LogUpdateProcessorFactory" />`
			`<processor class="solr.RunUpdateProcessorFactory" />`
			`</updateRequestProcessorChain>`
SOLR-2129: add note for AlchemyAPI and Calais to uima README.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1070183 13f79535-47bb-0310-9956-ffa450edef68 2011-02-12 21:45:11 -05:00
			`where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access`
			`key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html`

			`where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service`
			`key to exploit the Calais services: http://www.opencalais.com/apikey`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`the analysisEngine must contain an AE descriptor inside the specified path in the classpath`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`the analyzeFields must contain the input fields that need to be analyzed by UIMA,`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00			`if merge=true then their content will be merged and analyzed only once`

SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`field mapping describes which features of which types should go in a field`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00
SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1096315 13f79535-47bb-0310-9956-ffa450edef68 2011-04-24 07:48:43 -04:00			`4. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following:`
SOLR-2129: Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1062604 13f79535-47bb-0310-9956-ffa450edef68 2011-01-23 20:58:00 -05:00			`<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">`
			`<lst name="defaults">`
			`<str name="update.processor">uima</str>`
			`</lst>`
			`</requestHandler>`

			`Once you're done with the configuration you can index documents which will be automatically enriched with the specified fields`