SOLR-3650: checkpoint, migrated CHANGES.txt for contrib/uima and contrib/extraction

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1367384 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Chris M. Hostetter 2012-07-31 01:37:17 +00:00
parent 6961d9f589
commit 1cac548005
5 changed files with 97 additions and 197 deletions

View File

@ -1045,6 +1045,11 @@ New Features
support numeric collation, ignore punctuation/whitespace, ignore accents but support numeric collation, ignore punctuation/whitespace, ignore accents but
not case, control whether upper/lowercase values are sorted first, etc. (rmuir) not case, control whether upper/lowercase values are sorted first, etc. (rmuir)
* SOLR-2346: Add a chance to set content encoding explicitly via content type
of stream for extracting request handler. This is convenient when Tika's
auto detector cannot detect encoding, especially the text file is too short
to detect encoding. (koji)
Optimizations Optimizations
---------------------- ----------------------
* SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
@ -1286,6 +1291,12 @@ Other Changes
repository). Also updated dependencies jackson-core-asl and repository). Also updated dependencies jackson-core-asl and
jackson-mapper-asl (both v1.5.2 -> v1.7.4). (Dawid Weiss, Steve Rowe) jackson-mapper-asl (both v1.5.2 -> v1.7.4). (Dawid Weiss, Steve Rowe)
* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in
ivy.xml) because it requires java 6. If you want to parse this content with
extracting request handler and are willing to use java 6, just add the jar.
(rmuir)
Build Build
---------------------- ----------------------
* SOLR-2487: Add build target to package war without slf4j jars (janhoy) * SOLR-2487: Add build target to package war without slf4j jars (janhoy)
@ -1419,6 +1430,9 @@ Bug Fixes
* SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen) * SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen)
* SOLR-2746: Upgraded UIMA dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
================== 3.4.0 ================== ================== 3.4.0 ==================
Upgrading from Solr 3.3 Upgrading from Solr 3.3
@ -1577,6 +1591,9 @@ Bug Fixes
* SOLR-2629: Eliminate deprecation warnings in some JSPs. * SOLR-2629: Eliminate deprecation warnings in some JSPs.
(Bernd Fehling, hossman) (Bernd Fehling, hossman)
* SOLR-2743: Remove commons logging from contrib/extraction. (koji)
Build Build
---------------------- ----------------------
@ -1648,6 +1665,13 @@ New Features
* SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin) * SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
* SOLR-2480: Add ignoreTikaException flag to the extraction request handler so
that users can ignore TikaException but index meta data.
(Shinichiro Abe, koji)
* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
(Tommaso Teofili via koji)
Optimizations Optimizations
---------------------- ----------------------
@ -1667,6 +1691,12 @@ Bug Fixes
parameter is added to avoid excessive CPU time in extreme cases (e.g. long parameter is added to avoid excessive CPU time in extreme cases (e.g. long
queries with many misspelled words). (James Dyer via rmuir) queries with many misspelled words). (James Dyer via rmuir)
* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
(Elmer Garduno via koji)
* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
(Tommaso Teofili via koji)
Other Changes Other Changes
---------------------- ----------------------
@ -1706,6 +1736,10 @@ Upgrading from Solr 3.1
with update.chain rather than update.processor. The latter still works, with update.chain rather than update.processor. The latter still works,
but has been deprecated. but has been deprecated.
* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
It should move to UIMAUpdateRequestProcessorFactory setting.
See contrib/uima/README.txt for more details. (SOLR-2436)
Detailed Change List Detailed Change List
---------------------- ----------------------
@ -1732,6 +1766,12 @@ New Features
for clustering (SOLR-2450), output of cluster scores (SOLR-2505) for clustering (SOLR-2450), output of cluster scores (SOLR-2505)
(Stanislaw Osinski, Dawid Weiss). (Stanislaw Osinski, Dawid Weiss).
* SOLR-2503: extend UIMAUpdateRequestProcessorFactory mapping function to
map feature value to dynamicField. (koji)
* SOLR-2512: add ignoreErrors flag to UIMAUpdateRequestProcessorFactory so
that users can ignore exceptions in AE. (Tommaso Teofili, koji)
Optimizations Optimizations
---------------------- ----------------------
@ -1818,6 +1858,12 @@ Other Changes
* SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml, * SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
because html encoding confuses non-ascii users. (koji) because html encoding confuses non-ascii users. (koji)
* SOLR-2387: add mock annotators for improved testing in contrib/uima,
(Tommaso Teofili via rmuir)
* SOLR-2436: move uimaConfig to under the uima's update processor in
solrconfig.xml. (Tommaso Teofili, koji)
Build Build
---------------------- ----------------------
@ -2377,6 +2423,12 @@ Bug Fixes
* SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary * SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary
option (gsingers) option (gsingers)
* SOLR-1756: The date.format setting for extraction request handler causes
ClassCastException when enabled and the config code that parses this setting
does not properly use the same iterator instance.
(Christoph Brill, Mark Miller)
Other Changes Other Changes
---------------------- ----------------------
@ -2504,6 +2556,10 @@ Other Changes
* SOLR-141: Errors and Exceptions are formated by ResponseWriter. * SOLR-141: Errors and Exceptions are formated by ResponseWriter.
(Mike Sokolov, Rich Cariens, Daniel Naber, ryan) (Mike Sokolov, Rich Cariens, Daniel Naber, ryan)
* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
* SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic
extraction (Robert Muir via gsingers)
Build Build
---------------------- ----------------------
@ -2874,6 +2930,11 @@ New Features
84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional 84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
classpath directories and regular expressions. (hossman via yonik) classpath directories and regular expressions. (hossman via yonik)
85. SOLR-1128: Added metadata output to extraction request handler "extract
only" option. (gsingers)
86. SOLR-1274: Added text serialization output for extractOnly
(Peter Wolanin, gsingers)
Optimizations Optimizations
---------------------- ----------------------
@ -3288,6 +3349,14 @@ Other Changes
50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble) 50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)
51. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
52. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in
detecting Languages now in extracting request handler.
See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
for discussion on language detection.
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
Build Build
---------------------- ----------------------
1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers) 1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)

View File

@ -1,97 +0,0 @@
Apache Solr Content Extraction Library (Solr Cell)
Release Notes
This file describes changes to the Solr Cell (contrib/extraction) module. See SOLR-284 for details.
Introduction
------------
Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module
uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,
see http://wiki.apache.org/solr/ExtractingRequestHandler
Getting Started
---------------
You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
and configuring.
Tika Dependency
---------------
Current Version: Tika 1.1 (released 2012-03-23)
$Id$
================== Release 5.0.0 ==============
(No changes)
================== Release 4.0.0-ALPHA ==============
* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
================== Release 3.6.1 ==================
(No Changes)
================== Release 3.6.0 ==================
* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
This is convenient when Tika's auto detector cannot detect encoding, especially
the text file is too short to detect encoding. (koji)
* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)
* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
because it requires java 6. If you want to parse this content and are willing to
use java 6, just add the jar. (rmuir)
================== Release 3.5.0 ==================
* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)
================== Release 3.4.0 ==================
* SOLR-2540: CommitWithin as an Update Request parameter
You can now specify &commitWithin=N (ms) on the update request (janhoy)
* SOLR-2743: Remove commons logging. (koji)
================== Release 3.3.0 ==================
(No Changes)
================== Release 3.2.0 ==================
* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
meta data. (Shinichiro Abe, koji)
================== Release 3.1.0 ==================
* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)
* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)
* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)
================== Release 1.4.0 ==================
1. SOLR-284: Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)
2. SOLR-284: Removed "silent success" key generation (gsingers)
3. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
4. SOLR-1128: Added metadata output to "extract only" option. (gsingers)
5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
for discussion on language detection.
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)

View File

@ -0,0 +1,16 @@
Apache Solr Content Extraction Library (Solr Cell)
Introduction
------------
Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module
uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,
see http://wiki.apache.org/solr/ExtractingRequestHandler
Getting Started
---------------
You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
and configuring.

View File

@ -1,100 +0,0 @@
Apache Solr UIMA Metadata Extraction Library
Release Notes
This file describes changes to the Solr UIMA (contrib/uima) module. See SOLR-2129 for details.
Introduction
------------
This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
to be configured inside the schema.xml for use during analysis phase.
UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
Such fields could be language, concepts, keywords, sentences, named entities, etc.
UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
UIMA Dependency
---------------
uimaj-core v2.3.1
OpenCalaisAnnotator v2.3.1
HMMTagger v2.3.1
AlchemyAPIAnnotator v2.3.1
WhitespaceTokenizer v2.3.1
$Id$
================== 5.0.0 ==============
(No Changes)
================== 4.0.0-ALPHA ==============
(No Changes)
================== 3.6.1 ==================
(No Changes)
================== 3.6.0 ==================
(No Changes)
================== 3.5.0 ==================
Other Changes
----------------------
* SOLR-2746: Upgraded dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
================== 3.4.0 ==================
(No Changes)
================== 3.3.0 ==================
New Features
----------------------
* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
(Tommaso Teofili via koji)
Bug Fixes
----------------------
* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
(Elmer Garduno via koji)
* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
(Tommaso Teofili via koji)
================== 3.2.0 ==================
Upgrading from Solr 3.1
----------------------
* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
It should move to UIMAUpdateRequestProcessorFactory setting.
See contrib/uima/README.txt for more details. (SOLR-2436)
New Features
----------------------
* SOLR-2503: extend mapping function to map feature value to dynamicField. (koji)
* SOLR-2512: add ignoreErrors flag so that users can ignore exceptions in AE.
(Tommaso Teofili, koji)
Test Cases:
----------------------
* SOLR-2387: add mock annotators for improved testing,
(Tommaso Teofili via rmuir)
Other Changes
----------------------
* SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml.
(Tommaso Teofili, koji)
================== 3.1.0 ==================
Initial Release

View File

@ -1,3 +1,15 @@
Apache Solr UIMA Metadata Extraction Library
Introduction
------------
This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
to be configured inside the schema.xml for use during analysis phase.
UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
Such fields could be language, concepts, keywords, sentences, named entities, etc.
UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
Getting Started Getting Started
--------------- ---------------
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: