lucene/solr/contrib/extraction/CHANGES.txt

94 lines
3.6 KiB
Plaintext
Raw Normal View History

Apache Solr Content Extraction Library (Solr Cell)
Release Notes
This file describes changes to the Solr Cell (contrib/extraction) module. See SOLR-284 for details.
Introduction
------------
Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module
uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information,
see http://wiki.apache.org/solr/ExtractingRequestHandler
Getting Started
---------------
You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
and configuring.
Tika Dependency
---------------
Current Version: Tika 1.1 (released 2012-03-23)
$Id$
================== Release 5.0.0 ==============
(No changes)
================== Release 4.0.0-ALPHA ==============
* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
================== Release 3.6.0 ==================
* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
This is convenient when Tika's auto detector cannot detect encoding, especially
the text file is too short to detect encoding. (koji)
* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)
* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
because it requires java 6. If you want to parse this content and are willing to
use java 6, just add the jar. (rmuir)
================== Release 3.5.0 ==================
* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)
================== Release 3.4.0 ==================
* SOLR-2540: CommitWithin as an Update Request parameter
You can now specify &commitWithin=N (ms) on the update request (janhoy)
* SOLR-2743: Remove commons logging. (koji)
================== Release 3.3.0 ==================
(No Changes)
================== Release 3.2.0 ==================
* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
meta data. (Shinichiro Abe, koji)
================== Release 3.1.0 ==================
* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)
* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)
* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)
================== Release 1.4.0 ==================
1. SOLR-284: Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)
2. SOLR-284: Removed "silent success" key generation (gsingers)
3. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
4. SOLR-1128: Added metadata output to "extract only" option. (gsingers)
5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
for discussion on language detection.
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)