lucene/solr/contrib/extraction/CHANGES.txt

Apache Solr Content Extraction Library (Solr Cell)
                            Release Notes

This file describes changes to the Solr Cell (contrib/extraction) module.  See SOLR-284 for details.

Introduction
------------

Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
as Microsoft Word, Adobe PDF, etc.  (Each name is a trademark of their respective owners)  This contrib module
uses Apache Tika to extract content and metadata from the files, which can then be indexed.  For more information,
see http://wiki.apache.org/solr/ExtractingRequestHandler

Getting Started
---------------
You will need Solr up and running.  Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
to your Solr Home lib directory.  See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
 and configuring.

Tika Dependency
---------------

Current Version: Tika 1.0 (released 2011-11-07)

$Id$

================== Release 4.0.0-dev ==============

(No Changes)

================== Release 3.6.0 ==================

* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
  This is convenient when Tika's auto detector cannot detect encoding, especially
  the text file is too short to detect encoding. (koji)

* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)

* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
  because it requires java 6. If you want to parse this content and are willing to
  use java 6, just add the jar. (rmuir)

================== Release 3.5.0 ==================

* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)

================== Release 3.4.0 ==================

* SOLR-2540: CommitWithin as an Update Request parameter
  You can now specify &commitWithin=N (ms) on the update request (janhoy)

* SOLR-2743: Remove commons logging. (koji)

================== Release 3.3.0 ==================

(No Changes)

================== Release 3.2.0 ==================

* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
  meta data. (Shinichiro Abe, koji)

================== Release 3.1.0 ==================

* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call

* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
  parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)

* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)

* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)

================== Release 1.4.0 ==================

1. SOLR-284:  Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)

2. SOLR-284: Removed "silent success" key generation (gsingers)

3. SOLR-1075: Upgrade to Tika 0.3.  See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)

4. SOLR-1128: Added metadata output to "extract only" option.  (gsingers)

5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
    See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
    for discussion on language detection.
    See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)

6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)
SOLR-284: Solr Cell: Add support for Tika content extraction git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@723977 13f79535-47bb-0310-9956-ffa450edef68 2008-12-06 08:04:26 -05:00			`Apache Solr Content Extraction Library (Solr Cell)`
			`Release Notes`

			`This file describes changes to the Solr Cell (contrib/extraction) module. See SOLR-284 for details.`

			`Introduction`
			`------------`

			`Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such`
			`as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module`
			`uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,`
			`see http://wiki.apache.org/solr/ExtractingRequestHandler`

			`Getting Started`
			`---------------`
			`You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)`
			`to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in`
			`and configuring.`

SOLR-2372: Upgrade Solr to Tika 0.10 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1179404 13f79535-47bb-0310-9956-ffa450edef68 2011-10-05 15:53:23 -04:00			`Tika Dependency`
			`---------------`
SOLR-1902: fix Tika extraction issue git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@942753 13f79535-47bb-0310-9956-ffa450edef68 2010-05-10 10:36:54 -04:00
SOLR-2901: Upgrade Solr to Tika 1.0 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235753 13f79535-47bb-0310-9956-ffa450edef68 2012-01-25 09:18:06 -05:00			`Current Version: Tika 1.0 (released 2011-11-07)`
SOLR-1902: fix Tika extraction issue git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@942753 13f79535-47bb-0310-9956-ffa450edef68 2010-05-10 10:36:54 -04:00
SOLR-2480: add ignoreTikaException flag git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1103120 13f79535-47bb-0310-9956-ffa450edef68 2011-05-14 11:01:12 -04:00			$Id$
SOLR-1567: Upgrade to Tika 0.5 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@883095 13f79535-47bb-0310-9956-ffa450edef68 2009-11-22 11:18:49 -05:00
SOLR-2559: all solr contrib CHANGES.txt have 3.2-dev as the release header git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1129424 13f79535-47bb-0310-9956-ffa450edef68 2011-05-30 18:53:19 -04:00			`================== Release 4.0.0-dev ==============`
SOLR-2269: contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1060057 13f79535-47bb-0310-9956-ffa450edef68 2011-01-17 14:51:01 -05:00
			`(No Changes)`

preparing Lucene 3.5 release adding 3.6 constants and changes sections git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1204451 13f79535-47bb-0310-9956-ffa450edef68 2011-11-21 06:27:57 -05:00			`================== Release 3.6.0 ==================`

SOLR-2346: Add a chance to set content encoding explicitly via content type of stream. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225120 13f79535-47bb-0310-9956-ffa450edef68 2011-12-28 02:17:55 -05:00			`* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.`
			`This is convenient when Tika's auto detector cannot detect encoding, especially`
			`the text file is too short to detect encoding. (koji)`
preparing Lucene 3.5 release adding 3.6 constants and changes sections git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1204451 13f79535-47bb-0310-9956-ffa450edef68 2011-11-21 06:27:57 -05:00
SOLR-2901: Upgrade Solr to Tika 1.0 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235753 13f79535-47bb-0310-9956-ffa450edef68 2012-01-25 09:18:06 -05:00			`* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)`

SOLR-3295: merge changes entry git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1308891 13f79535-47bb-0310-9956-ffa450edef68 2012-04-03 09:30:45 -04:00			`* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)`
			`because it requires java 6. If you want to parse this content and are willing to`
			`use java 6, just add the jar. (rmuir)`

open up 3.5 section git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1166652 13f79535-47bb-0310-9956-ffa450edef68 2011-09-08 08:24:16 -04:00			`================== Release 3.5.0 ==================`

SOLR-2372: Upgrade Solr to Tika 0.10 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1179404 13f79535-47bb-0310-9956-ffa450edef68 2011-10-05 15:53:23 -04:00			`* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)`
open up 3.5 section git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1166652 13f79535-47bb-0310-9956-ffa450edef68 2011-09-08 08:24:16 -04:00
			`================== Release 3.4.0 ==================`
bump 3.3-dev to 3.4-dev git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1138357 13f79535-47bb-0310-9956-ffa450edef68 2011-06-22 05:31:18 -04:00
SOLR-2540: CommitWithin as an Update Request parameter git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1165754 13f79535-47bb-0310-9956-ffa450edef68 2011-09-06 12:27:20 -04:00			`* SOLR-2540: CommitWithin as an Update Request parameter`
			`You can now specify &commitWithin=N (ms) on the update request (janhoy)`

SOLR-2743: remove commons logging jar from contrib/extraction/lib git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1164956 13f79535-47bb-0310-9956-ffa450edef68 2011-09-03 22:43:33 -04:00			`* SOLR-2743: Remove commons logging. (koji)`
bump 3.3-dev to 3.4-dev git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1138357 13f79535-47bb-0310-9956-ffa450edef68 2011-06-22 05:31:18 -04:00
			`================== Release 3.3.0 ==================`
SOLR-2559: all solr contrib CHANGES.txt have 3.2-dev as the release header git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1129424 13f79535-47bb-0310-9956-ffa450edef68 2011-05-30 18:53:19 -04:00
			`(No Changes)`

			`================== Release 3.2.0 ==================`
add empty 3.2 section to CHANGES git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1078614 13f79535-47bb-0310-9956-ffa450edef68 2011-03-06 17:38:05 -05:00
SOLR-2480: add ignoreTikaException flag git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1103120 13f79535-47bb-0310-9956-ffa450edef68 2011-05-14 11:01:12 -04:00			`* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index`
			`meta data. (Shinichiro Abe, koji)`
add empty 3.2 section to CHANGES git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1078614 13f79535-47bb-0310-9956-ffa450edef68 2011-03-06 17:38:05 -05:00
SOLR-2559: all solr contrib CHANGES.txt have 3.2-dev as the release header git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1129424 13f79535-47bb-0310-9956-ffa450edef68 2011-05-30 18:53:19 -04:00			`================== Release 3.1.0 ==================`
SOLR-1738: Upgrade to Tika 0.6 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@910806 13f79535-47bb-0310-9956-ffa450edef68 2010-02-16 21:57:04 -05:00
SOLR-2269: contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1060057 13f79535-47bb-0310-9956-ffa450edef68 2011-01-17 14:51:01 -05:00			`* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call`
SOLR-1567: Upgrade to Tika 0.5 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@883095 13f79535-47bb-0310-9956-ffa450edef68 2009-11-22 11:18:49 -05:00
SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that parses this setting does not properly use the same iterator instance. git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@906556 13f79535-47bb-0310-9956-ffa450edef68 2010-02-04 11:36:11 -05:00			`* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that`
			`parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)`

SOLR-1318: Added ICU4J to extraction and test for Arabic git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@921425 13f79535-47bb-0310-9956-ffa450edef68 2010-03-10 11:18:09 -05:00			`* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)`
SOLR-1738: Upgrade to Tika 0.6 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@910806 13f79535-47bb-0310-9956-ffa450edef68 2010-02-16 21:57:04 -05:00
SOLR-1902: fix Tika extraction issue git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@942753 13f79535-47bb-0310-9956-ffa450edef68 2010-05-10 10:36:54 -04:00			`* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)`
SOLR-1819: Upgraded to Tika 0.7 RC git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@930275 13f79535-47bb-0310-9956-ffa450edef68 2010-04-02 10:08:55 -04:00
Update/remove 1.4-dev references. git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@829936 13f79535-47bb-0310-9956-ffa450edef68 2009-10-26 16:23:00 -04:00			`================== Release 1.4.0 ==================`
SOLR-284: Solr Cell: Add support for Tika content extraction git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@723977 13f79535-47bb-0310-9956-ffa450edef68 2008-12-06 08:04:26 -05:00
			`1. SOLR-284: Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)`
SOLR-284: remove auto key generation and update tests git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@741907 13f79535-47bb-0310-9956-ffa450edef68 2009-02-07 11:02:41 -05:00
SOLR-1075: Upgrade to Tika 0.3 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@756979 13f79535-47bb-0310-9956-ffa450edef68 2009-03-21 13:55:28 -04:00			`2. SOLR-284: Removed "silent success" key generation (gsingers)`

SOLR-1128: added metadata output to extract only option git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@768281 13f79535-47bb-0310-9956-ffa450edef68 2009-04-24 08:42:22 -04:00			`3. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)`

SOLR-1310: Upgrade to Tika 0.4 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@798253 13f79535-47bb-0310-9956-ffa450edef68 2009-07-27 14:48:58 -04:00			`4. SOLR-1128: Added metadata output to "extract only" option. (gsingers)`

			`5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.`
			`See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c`
			`for discussion on language detection.`
SOLR-1274: added extract only output options git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@802282 13f79535-47bb-0310-9956-ffa450edef68 2009-08-07 21:39:16 -04:00			`See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)`

SOLR-1567: Upgrade to Tika 0.5 git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@883095 13f79535-47bb-0310-9956-ffa450edef68 2009-11-22 11:18:49 -05:00			`6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)`