SOLR-3650: checkpoint, migrated CHANGES.txt for contrib/uima and contrib/extraction

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1367384 13f79535-47bb-0310-9956-ffa450edef68
2025-02-23 10:51:29 +00:00 · 2012-07-31 01:37:17 +00:00 · 2012-07-31 01:37:17 +00:00 · 1cac548005
commit 1cac548005
parent 6961d9f589
5 changed files with 97 additions and 197 deletions
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@ -1045,6 +1045,11 @@ New Features
  support numeric collation, ignore punctuation/whitespace, ignore accents but
  not case, control whether upper/lowercase values are sorted first, etc.  (rmuir)

+* SOLR-2346: Add a chance to set content encoding explicitly via content type 
+  of stream for extracting request handler.  This is convenient when Tika's 
+  auto detector cannot detect encoding, especially the text file is too short 
+  to detect encoding. (koji)
+
 Optimizations
 ----------------------
 * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
@ -1286,6 +1291,12 @@ Other Changes
  repository).  Also updated dependencies jackson-core-asl and
  jackson-mapper-asl (both v1.5.2 -> v1.7.4).  (Dawid Weiss, Steve Rowe)

+* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in 
+  ivy.xml) because it requires java 6. If you want to parse this content with 
+  extracting request handler and are willing to use java 6, just add the jar. 
+  (rmuir)
+
+
 Build
 ----------------------
 * SOLR-2487: Add build target to package war without slf4j jars (janhoy)
@ -1419,6 +1430,9 @@ Bug Fixes

 * SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen)

+* SOLR-2746: Upgraded UIMA dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
+
+
 ==================  3.4.0  ==================

 Upgrading from Solr 3.3
@ -1577,6 +1591,9 @@ Bug Fixes
 * SOLR-2629: Eliminate deprecation warnings in some JSPs.
  (Bernd Fehling, hossman)

+* SOLR-2743: Remove commons logging from contrib/extraction. (koji)
+
+
 Build
 ----------------------

@ -1648,6 +1665,13 @@ New Features

 * SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)

+* SOLR-2480: Add ignoreTikaException flag to the extraction request handler so 
+  that users can ignore TikaException but index meta data. 
+  (Shinichiro Abe, koji)
+
+* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
+  (Tommaso Teofili via koji)
+
 Optimizations
 ----------------------

@ -1667,6 +1691,12 @@ Bug Fixes
  parameter is added to avoid excessive CPU time in extreme cases (e.g. long
  queries with many misspelled words).  (James Dyer via rmuir)

+* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
+  (Elmer Garduno via koji)
+
+* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
+  (Tommaso Teofili via koji)
+
 Other Changes
 ----------------------

@ -1706,6 +1736,10 @@ Upgrading from Solr 3.1
  with update.chain rather than update.processor. The latter still works,
  but has been deprecated.

+* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
+  It should move to UIMAUpdateRequestProcessorFactory setting.
+  See contrib/uima/README.txt for more details. (SOLR-2436)
+
 Detailed Change List
 ----------------------

@ -1732,6 +1766,12 @@ New Features
  for clustering (SOLR-2450), output of cluster scores (SOLR-2505)
  (Stanislaw Osinski, Dawid Weiss).

+* SOLR-2503: extend UIMAUpdateRequestProcessorFactory mapping function to 
+  map feature value to dynamicField. (koji)
+
+* SOLR-2512: add ignoreErrors flag to UIMAUpdateRequestProcessorFactory so 
+  that users can ignore exceptions in AE. (Tommaso Teofili, koji)
+
 Optimizations
 ----------------------

@ -1818,6 +1858,12 @@ Other Changes
 * SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
  because html encoding confuses non-ascii users. (koji)

+* SOLR-2387: add mock annotators for improved testing in contrib/uima,
+  (Tommaso Teofili via rmuir)
+
+* SOLR-2436: move uimaConfig to under the uima's update processor in 
+  solrconfig.xml.  (Tommaso Teofili, koji)
+
 Build
 ----------------------

@ -2377,6 +2423,12 @@ Bug Fixes
 * SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary 
  option (gsingers)

+* SOLR-1756: The date.format setting for extraction request handler causes 
+  ClassCastException when enabled and the config code that parses this setting 
+  does not properly use the same iterator instance. 
+  (Christoph Brill, Mark Miller)
+
+
 Other Changes
 ----------------------

@ -2504,6 +2556,10 @@ Other Changes
 * SOLR-141: Errors and Exceptions are formated by ResponseWriter.
  (Mike Sokolov, Rich Cariens, Daniel Naber, ryan)

+* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
+
+* SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic 
+  extraction (Robert Muir via gsingers)

 Build
 ----------------------
@ -2874,6 +2930,11 @@ New Features
 84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
    classpath directories and regular expressions. (hossman via yonik)

+85. SOLR-1128: Added metadata output to extraction request handler "extract 
+    only" option.  (gsingers)
+
+86. SOLR-1274: Added text serialization output for extractOnly 
+    (Peter Wolanin, gsingers)  

 Optimizations
 ----------------------
@ -3288,6 +3349,14 @@ Other Changes

 50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)

+51. SOLR-1075: Upgrade to Tika 0.3.  See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
+
+52. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in 
+    detecting Languages now in extracting request handler.
+    See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
+    for discussion on language detection.
+    See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
+
 Build
 ----------------------
 1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
--- a/solr/contrib/extraction/CHANGES.txt
+++ b/solr/contrib/extraction/CHANGES.txt
@ -1,97 +0,0 @@
-Apache Solr Content Extraction Library (Solr Cell)
-                            Release Notes
-
-This file describes changes to the Solr Cell (contrib/extraction) module.  See SOLR-284 for details.
-
-Introduction
------------
-
-Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
-as Microsoft Word, Adobe PDF, etc.  (Each name is a trademark of their respective owners)  This contrib module
-uses Apache Tika to extract content and metadata from the files, which can then be indexed.  For more information,
-see http://wiki.apache.org/solr/ExtractingRequestHandler
-
-Getting Started
---------------
-You will need Solr up and running.  Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
-to your Solr Home lib directory.  See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
- and configuring.
-
-Tika Dependency
---------------
-
-Current Version: Tika 1.1 (released 2012-03-23)
-
-$Id$
-
-================== Release 5.0.0 ==============
-
- (No changes)
-
-================== Release 4.0.0-ALPHA ==============
-
-* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
-
-================== Release 3.6.1 ==================
-
-(No Changes)
-
-================== Release 3.6.0 ==================
-
-* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
-  This is convenient when Tika's auto detector cannot detect encoding, especially
-  the text file is too short to detect encoding. (koji)
-
-* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)
-
-* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
-  because it requires java 6. If you want to parse this content and are willing to
-  use java 6, just add the jar. (rmuir)
-
-================== Release 3.5.0 ==================
-
-* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)
-
-================== Release 3.4.0 ==================
-
-* SOLR-2540: CommitWithin as an Update Request parameter
-  You can now specify &commitWithin=N (ms) on the update request (janhoy)
-
-* SOLR-2743: Remove commons logging. (koji)
-
-================== Release 3.3.0 ==================
-
-(No Changes)
-
-================== Release 3.2.0 ==================
-
-* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
-  meta data. (Shinichiro Abe, koji)
-
-================== Release 3.1.0 ==================
-
-* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
-
-* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
-  parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)
-
-* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)
-
-* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)
-
-================== Release 1.4.0 ==================
-
-1. SOLR-284:  Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)
-
-2. SOLR-284: Removed "silent success" key generation (gsingers)
-
-3. SOLR-1075: Upgrade to Tika 0.3.  See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
-
-4. SOLR-1128: Added metadata output to "extract only" option.  (gsingers)
-
-5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
-    See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
-    for discussion on language detection.
-    See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
-
-6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)    
--- a/solr/contrib/extraction/README.txt
+++ b/solr/contrib/extraction/README.txt
@ -0,0 +1,16 @@
+Apache Solr Content Extraction Library (Solr Cell)
+
+Introduction
+------------
+
+Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
+as Microsoft Word, Adobe PDF, etc.  (Each name is a trademark of their respective owners)  This contrib module
+uses Apache Tika to extract content and metadata from the files, which can then be indexed.  For more information,
+see http://wiki.apache.org/solr/ExtractingRequestHandler
+
+Getting Started
+---------------
+You will need Solr up and running.  Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
+to your Solr Home lib directory.  See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
+ and configuring.
+
--- a/solr/contrib/uima/CHANGES.txt
+++ b/solr/contrib/uima/CHANGES.txt
@ -1,100 +0,0 @@
-Apache Solr UIMA Metadata Extraction Library
-                            Release Notes
-
-This file describes changes to the Solr UIMA (contrib/uima) module. See SOLR-2129 for details.
-
-Introduction
------------
-This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
-to be configured inside the schema.xml for use during analysis phase.
-UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
-Such fields could be language, concepts, keywords, sentences, named entities, etc.
-UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
-inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
-
- UIMA Dependency
- ---------------
-uimaj-core          v2.3.1 
-OpenCalaisAnnotator v2.3.1
-HMMTagger           v2.3.1
-AlchemyAPIAnnotator v2.3.1
-WhitespaceTokenizer v2.3.1
-
-$Id$
-
-==================  5.0.0 ==============
-
-(No Changes)
-
-==================  4.0.0-ALPHA ==============
-
-(No Changes)
-
-==================  3.6.1 ==================
-
-(No Changes)
-
-==================  3.6.0 ==================
-
-(No Changes)
-
-==================  3.5.0 ==================
-
-Other Changes
----------------------
-
-* SOLR-2746: Upgraded dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
-
-==================  3.4.0 ==================
-
-(No Changes)
-
-==================  3.3.0 ==================
-
-New Features
----------------------
-
-* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
-  (Tommaso Teofili via koji)
-  
-Bug Fixes
----------------------
-
-* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
-  (Elmer Garduno via koji)
-
-* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
-  (Tommaso Teofili via koji)
-
-==================  3.2.0 ==================
-
-Upgrading from Solr 3.1
----------------------
-
-* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
-  It should move to UIMAUpdateRequestProcessorFactory setting.
-  See contrib/uima/README.txt for more details. (SOLR-2436)
-
-New Features
----------------------
-
-* SOLR-2503: extend mapping function to map feature value to dynamicField. (koji)
-
-* SOLR-2512: add ignoreErrors flag so that users can ignore exceptions in AE.
-  (Tommaso Teofili, koji)
-
-Test Cases:
----------------------
-
-* SOLR-2387: add mock annotators for improved testing,
-  (Tommaso Teofili via rmuir)
-
-Other Changes
----------------------
-
-* SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml.
-  (Tommaso Teofili, koji)
-
-==================  3.1.0 ==================
-
-Initial Release
--- a/solr/contrib/uima/README.txt
+++ b/solr/contrib/uima/README.txt
@ -1,3 +1,15 @@
+Apache Solr UIMA Metadata Extraction Library
+
+Introduction
+------------
+This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
+to be configured inside the schema.xml for use during analysis phase.
+UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
+Such fields could be language, concepts, keywords, sentences, named entities, etc.
+UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
+inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
+
+
 Getting Started
 ---------------
 To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: