mirror of https://github.com/apache/lucene.git
SOLR-3650: checkpoint, migrated CHANGES.txt for contrib/uima and contrib/extraction
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1367384 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
6961d9f589
commit
1cac548005
|
@ -1045,6 +1045,11 @@ New Features
|
||||||
support numeric collation, ignore punctuation/whitespace, ignore accents but
|
support numeric collation, ignore punctuation/whitespace, ignore accents but
|
||||||
not case, control whether upper/lowercase values are sorted first, etc. (rmuir)
|
not case, control whether upper/lowercase values are sorted first, etc. (rmuir)
|
||||||
|
|
||||||
|
* SOLR-2346: Add a chance to set content encoding explicitly via content type
|
||||||
|
of stream for extracting request handler. This is convenient when Tika's
|
||||||
|
auto detector cannot detect encoding, especially the text file is too short
|
||||||
|
to detect encoding. (koji)
|
||||||
|
|
||||||
Optimizations
|
Optimizations
|
||||||
----------------------
|
----------------------
|
||||||
* SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
|
* SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
|
||||||
|
@ -1286,6 +1291,12 @@ Other Changes
|
||||||
repository). Also updated dependencies jackson-core-asl and
|
repository). Also updated dependencies jackson-core-asl and
|
||||||
jackson-mapper-asl (both v1.5.2 -> v1.7.4). (Dawid Weiss, Steve Rowe)
|
jackson-mapper-asl (both v1.5.2 -> v1.7.4). (Dawid Weiss, Steve Rowe)
|
||||||
|
|
||||||
|
* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in
|
||||||
|
ivy.xml) because it requires java 6. If you want to parse this content with
|
||||||
|
extracting request handler and are willing to use java 6, just add the jar.
|
||||||
|
(rmuir)
|
||||||
|
|
||||||
|
|
||||||
Build
|
Build
|
||||||
----------------------
|
----------------------
|
||||||
* SOLR-2487: Add build target to package war without slf4j jars (janhoy)
|
* SOLR-2487: Add build target to package war without slf4j jars (janhoy)
|
||||||
|
@ -1419,6 +1430,9 @@ Bug Fixes
|
||||||
|
|
||||||
* SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen)
|
* SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen)
|
||||||
|
|
||||||
|
* SOLR-2746: Upgraded UIMA dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
|
||||||
|
|
||||||
|
|
||||||
================== 3.4.0 ==================
|
================== 3.4.0 ==================
|
||||||
|
|
||||||
Upgrading from Solr 3.3
|
Upgrading from Solr 3.3
|
||||||
|
@ -1577,6 +1591,9 @@ Bug Fixes
|
||||||
* SOLR-2629: Eliminate deprecation warnings in some JSPs.
|
* SOLR-2629: Eliminate deprecation warnings in some JSPs.
|
||||||
(Bernd Fehling, hossman)
|
(Bernd Fehling, hossman)
|
||||||
|
|
||||||
|
* SOLR-2743: Remove commons logging from contrib/extraction. (koji)
|
||||||
|
|
||||||
|
|
||||||
Build
|
Build
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -1648,6 +1665,13 @@ New Features
|
||||||
|
|
||||||
* SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
|
* SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
|
||||||
|
|
||||||
|
* SOLR-2480: Add ignoreTikaException flag to the extraction request handler so
|
||||||
|
that users can ignore TikaException but index meta data.
|
||||||
|
(Shinichiro Abe, koji)
|
||||||
|
|
||||||
|
* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
|
||||||
|
(Tommaso Teofili via koji)
|
||||||
|
|
||||||
Optimizations
|
Optimizations
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -1667,6 +1691,12 @@ Bug Fixes
|
||||||
parameter is added to avoid excessive CPU time in extreme cases (e.g. long
|
parameter is added to avoid excessive CPU time in extreme cases (e.g. long
|
||||||
queries with many misspelled words). (James Dyer via rmuir)
|
queries with many misspelled words). (James Dyer via rmuir)
|
||||||
|
|
||||||
|
* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
|
||||||
|
(Elmer Garduno via koji)
|
||||||
|
|
||||||
|
* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
|
||||||
|
(Tommaso Teofili via koji)
|
||||||
|
|
||||||
Other Changes
|
Other Changes
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -1706,6 +1736,10 @@ Upgrading from Solr 3.1
|
||||||
with update.chain rather than update.processor. The latter still works,
|
with update.chain rather than update.processor. The latter still works,
|
||||||
but has been deprecated.
|
but has been deprecated.
|
||||||
|
|
||||||
|
* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
|
||||||
|
It should move to UIMAUpdateRequestProcessorFactory setting.
|
||||||
|
See contrib/uima/README.txt for more details. (SOLR-2436)
|
||||||
|
|
||||||
Detailed Change List
|
Detailed Change List
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -1732,6 +1766,12 @@ New Features
|
||||||
for clustering (SOLR-2450), output of cluster scores (SOLR-2505)
|
for clustering (SOLR-2450), output of cluster scores (SOLR-2505)
|
||||||
(Stanislaw Osinski, Dawid Weiss).
|
(Stanislaw Osinski, Dawid Weiss).
|
||||||
|
|
||||||
|
* SOLR-2503: extend UIMAUpdateRequestProcessorFactory mapping function to
|
||||||
|
map feature value to dynamicField. (koji)
|
||||||
|
|
||||||
|
* SOLR-2512: add ignoreErrors flag to UIMAUpdateRequestProcessorFactory so
|
||||||
|
that users can ignore exceptions in AE. (Tommaso Teofili, koji)
|
||||||
|
|
||||||
Optimizations
|
Optimizations
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -1818,6 +1858,12 @@ Other Changes
|
||||||
* SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
|
* SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
|
||||||
because html encoding confuses non-ascii users. (koji)
|
because html encoding confuses non-ascii users. (koji)
|
||||||
|
|
||||||
|
* SOLR-2387: add mock annotators for improved testing in contrib/uima,
|
||||||
|
(Tommaso Teofili via rmuir)
|
||||||
|
|
||||||
|
* SOLR-2436: move uimaConfig to under the uima's update processor in
|
||||||
|
solrconfig.xml. (Tommaso Teofili, koji)
|
||||||
|
|
||||||
Build
|
Build
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -2377,6 +2423,12 @@ Bug Fixes
|
||||||
* SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary
|
* SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary
|
||||||
option (gsingers)
|
option (gsingers)
|
||||||
|
|
||||||
|
* SOLR-1756: The date.format setting for extraction request handler causes
|
||||||
|
ClassCastException when enabled and the config code that parses this setting
|
||||||
|
does not properly use the same iterator instance.
|
||||||
|
(Christoph Brill, Mark Miller)
|
||||||
|
|
||||||
|
|
||||||
Other Changes
|
Other Changes
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -2504,6 +2556,10 @@ Other Changes
|
||||||
* SOLR-141: Errors and Exceptions are formated by ResponseWriter.
|
* SOLR-141: Errors and Exceptions are formated by ResponseWriter.
|
||||||
(Mike Sokolov, Rich Cariens, Daniel Naber, ryan)
|
(Mike Sokolov, Rich Cariens, Daniel Naber, ryan)
|
||||||
|
|
||||||
|
* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
|
||||||
|
|
||||||
|
* SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic
|
||||||
|
extraction (Robert Muir via gsingers)
|
||||||
|
|
||||||
Build
|
Build
|
||||||
----------------------
|
----------------------
|
||||||
|
@ -2874,6 +2930,11 @@ New Features
|
||||||
84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
|
84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
|
||||||
classpath directories and regular expressions. (hossman via yonik)
|
classpath directories and regular expressions. (hossman via yonik)
|
||||||
|
|
||||||
|
85. SOLR-1128: Added metadata output to extraction request handler "extract
|
||||||
|
only" option. (gsingers)
|
||||||
|
|
||||||
|
86. SOLR-1274: Added text serialization output for extractOnly
|
||||||
|
(Peter Wolanin, gsingers)
|
||||||
|
|
||||||
Optimizations
|
Optimizations
|
||||||
----------------------
|
----------------------
|
||||||
|
@ -3288,6 +3349,14 @@ Other Changes
|
||||||
|
|
||||||
50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)
|
50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)
|
||||||
|
|
||||||
|
51. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
|
||||||
|
|
||||||
|
52. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in
|
||||||
|
detecting Languages now in extracting request handler.
|
||||||
|
See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
|
||||||
|
for discussion on language detection.
|
||||||
|
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
|
||||||
|
|
||||||
Build
|
Build
|
||||||
----------------------
|
----------------------
|
||||||
1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
|
1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
|
||||||
|
|
|
@ -1,97 +0,0 @@
|
||||||
Apache Solr Content Extraction Library (Solr Cell)
|
|
||||||
Release Notes
|
|
||||||
|
|
||||||
This file describes changes to the Solr Cell (contrib/extraction) module. See SOLR-284 for details.
|
|
||||||
|
|
||||||
Introduction
|
|
||||||
------------
|
|
||||||
|
|
||||||
Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
|
|
||||||
as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module
|
|
||||||
uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,
|
|
||||||
see http://wiki.apache.org/solr/ExtractingRequestHandler
|
|
||||||
|
|
||||||
Getting Started
|
|
||||||
---------------
|
|
||||||
You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
|
|
||||||
to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
|
|
||||||
and configuring.
|
|
||||||
|
|
||||||
Tika Dependency
|
|
||||||
---------------
|
|
||||||
|
|
||||||
Current Version: Tika 1.1 (released 2012-03-23)
|
|
||||||
|
|
||||||
$Id$
|
|
||||||
|
|
||||||
================== Release 5.0.0 ==============
|
|
||||||
|
|
||||||
(No changes)
|
|
||||||
|
|
||||||
================== Release 4.0.0-ALPHA ==============
|
|
||||||
|
|
||||||
* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
|
|
||||||
|
|
||||||
================== Release 3.6.1 ==================
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== Release 3.6.0 ==================
|
|
||||||
|
|
||||||
* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
|
|
||||||
This is convenient when Tika's auto detector cannot detect encoding, especially
|
|
||||||
the text file is too short to detect encoding. (koji)
|
|
||||||
|
|
||||||
* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)
|
|
||||||
|
|
||||||
* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
|
|
||||||
because it requires java 6. If you want to parse this content and are willing to
|
|
||||||
use java 6, just add the jar. (rmuir)
|
|
||||||
|
|
||||||
================== Release 3.5.0 ==================
|
|
||||||
|
|
||||||
* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)
|
|
||||||
|
|
||||||
================== Release 3.4.0 ==================
|
|
||||||
|
|
||||||
* SOLR-2540: CommitWithin as an Update Request parameter
|
|
||||||
You can now specify &commitWithin=N (ms) on the update request (janhoy)
|
|
||||||
|
|
||||||
* SOLR-2743: Remove commons logging. (koji)
|
|
||||||
|
|
||||||
================== Release 3.3.0 ==================
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== Release 3.2.0 ==================
|
|
||||||
|
|
||||||
* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
|
|
||||||
meta data. (Shinichiro Abe, koji)
|
|
||||||
|
|
||||||
================== Release 3.1.0 ==================
|
|
||||||
|
|
||||||
* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
|
|
||||||
|
|
||||||
* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
|
|
||||||
parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)
|
|
||||||
|
|
||||||
* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)
|
|
||||||
|
|
||||||
* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)
|
|
||||||
|
|
||||||
================== Release 1.4.0 ==================
|
|
||||||
|
|
||||||
1. SOLR-284: Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)
|
|
||||||
|
|
||||||
2. SOLR-284: Removed "silent success" key generation (gsingers)
|
|
||||||
|
|
||||||
3. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
|
|
||||||
|
|
||||||
4. SOLR-1128: Added metadata output to "extract only" option. (gsingers)
|
|
||||||
|
|
||||||
5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
|
|
||||||
See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
|
|
||||||
for discussion on language detection.
|
|
||||||
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
|
|
||||||
|
|
||||||
6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)
|
|
|
@ -0,0 +1,16 @@
|
||||||
|
Apache Solr Content Extraction Library (Solr Cell)
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
|
||||||
|
as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module
|
||||||
|
uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,
|
||||||
|
see http://wiki.apache.org/solr/ExtractingRequestHandler
|
||||||
|
|
||||||
|
Getting Started
|
||||||
|
---------------
|
||||||
|
You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
|
||||||
|
to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
|
||||||
|
and configuring.
|
||||||
|
|
|
@ -1,100 +0,0 @@
|
||||||
Apache Solr UIMA Metadata Extraction Library
|
|
||||||
Release Notes
|
|
||||||
|
|
||||||
This file describes changes to the Solr UIMA (contrib/uima) module. See SOLR-2129 for details.
|
|
||||||
|
|
||||||
Introduction
|
|
||||||
------------
|
|
||||||
This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
|
|
||||||
to be configured inside the schema.xml for use during analysis phase.
|
|
||||||
UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
|
|
||||||
Such fields could be language, concepts, keywords, sentences, named entities, etc.
|
|
||||||
UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
|
|
||||||
inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
|
|
||||||
|
|
||||||
UIMA Dependency
|
|
||||||
---------------
|
|
||||||
uimaj-core v2.3.1
|
|
||||||
OpenCalaisAnnotator v2.3.1
|
|
||||||
HMMTagger v2.3.1
|
|
||||||
AlchemyAPIAnnotator v2.3.1
|
|
||||||
WhitespaceTokenizer v2.3.1
|
|
||||||
|
|
||||||
$Id$
|
|
||||||
|
|
||||||
================== 5.0.0 ==============
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== 4.0.0-ALPHA ==============
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== 3.6.1 ==================
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== 3.6.0 ==================
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== 3.5.0 ==================
|
|
||||||
|
|
||||||
Other Changes
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2746: Upgraded dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
|
|
||||||
|
|
||||||
================== 3.4.0 ==================
|
|
||||||
|
|
||||||
(No Changes)
|
|
||||||
|
|
||||||
================== 3.3.0 ==================
|
|
||||||
|
|
||||||
New Features
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
|
|
||||||
(Tommaso Teofili via koji)
|
|
||||||
|
|
||||||
Bug Fixes
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
|
|
||||||
(Elmer Garduno via koji)
|
|
||||||
|
|
||||||
* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
|
|
||||||
(Tommaso Teofili via koji)
|
|
||||||
|
|
||||||
================== 3.2.0 ==================
|
|
||||||
|
|
||||||
Upgrading from Solr 3.1
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
|
|
||||||
It should move to UIMAUpdateRequestProcessorFactory setting.
|
|
||||||
See contrib/uima/README.txt for more details. (SOLR-2436)
|
|
||||||
|
|
||||||
New Features
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2503: extend mapping function to map feature value to dynamicField. (koji)
|
|
||||||
|
|
||||||
* SOLR-2512: add ignoreErrors flag so that users can ignore exceptions in AE.
|
|
||||||
(Tommaso Teofili, koji)
|
|
||||||
|
|
||||||
Test Cases:
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2387: add mock annotators for improved testing,
|
|
||||||
(Tommaso Teofili via rmuir)
|
|
||||||
|
|
||||||
Other Changes
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml.
|
|
||||||
(Tommaso Teofili, koji)
|
|
||||||
|
|
||||||
================== 3.1.0 ==================
|
|
||||||
|
|
||||||
Initial Release
|
|
|
@ -1,3 +1,15 @@
|
||||||
|
Apache Solr UIMA Metadata Extraction Library
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
|
||||||
|
to be configured inside the schema.xml for use during analysis phase.
|
||||||
|
UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
|
||||||
|
Such fields could be language, concepts, keywords, sentences, named entities, etc.
|
||||||
|
UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
|
||||||
|
inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
|
||||||
|
|
||||||
|
|
||||||
Getting Started
|
Getting Started
|
||||||
---------------
|
---------------
|
||||||
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:
|
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:
|
||||||
|
|
Loading…
Reference in New Issue