mirror of https://github.com/apache/lucene.git
SOLR-3650: migrate DIH CHANGES.txt
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1368190 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
f1ae7dad35
commit
4eb362c0b3
507
solr/CHANGES.txt
507
solr/CHANGES.txt
|
@ -709,6 +709,13 @@ Bug Fixes
|
|||
* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
|
||||
are respected now (Stanislaw Osinski, Dawid Weiss)
|
||||
|
||||
* SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems
|
||||
revealed by this new test related to the expanded cache support added to
|
||||
3.6/SOLR-2382 (James Dyer)
|
||||
|
||||
* SOLR-1958: When using the MailEntityProcessor, import would fail if
|
||||
fetchMailsSince was not specified. (Max Lynch via James Dyer)
|
||||
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
@ -862,7 +869,13 @@ Other Changes
|
|||
* SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter
|
||||
when 'qf' is absent. And if neither is present nor the schema default search field
|
||||
then an exception will be thrown now. (dsmiley)
|
||||
|
||||
|
||||
* SOLR-3262: The "threads" feature of DIH is removed (deprecated in Solr 3.6)
|
||||
(James Dyer)
|
||||
|
||||
* SOLR-3422: Refactored DIH internal data classes. All entities in
|
||||
data-config.xml must have a name (James Dyer)
|
||||
|
||||
Documentation
|
||||
----------------------
|
||||
|
||||
|
@ -898,6 +911,17 @@ Bug Fixes:
|
|||
* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
|
||||
are respected now (Stanislaw Osinski, Dawid Weiss)
|
||||
|
||||
* SOLR-3360: More DIH bug fixes for the deprecated "threads" parameter.
|
||||
(Mikhail Khludnev, Claudio R, via James Dyer)
|
||||
|
||||
* SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems
|
||||
revealed by this new test related to the expanded cache support added to
|
||||
3.6/SOLR-2382 (James Dyer)
|
||||
|
||||
* SOLR-3336: SolrEntityProcessor substitutes most variables at query time.
|
||||
(Michael Kroh, Lance Norskog, via Martijn van Groningen)
|
||||
|
||||
|
||||
================== 3.6.0 ==================
|
||||
More information about this release, including any errata related to the
|
||||
release notes, upgrade instructions, or other changes may be found online at:
|
||||
|
@ -1050,6 +1074,27 @@ New Features
|
|||
auto detector cannot detect encoding, especially the text file is too short
|
||||
to detect encoding. (koji)
|
||||
|
||||
* SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core
|
||||
or instance based on a specified query.
|
||||
(Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna,
|
||||
Martijn van Groningen)
|
||||
|
||||
* SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency
|
||||
between solr parameters and parameters used in SolrEntityProcessor and
|
||||
ability to specify a custom HttpClient instance.
|
||||
(Luca Cavanna via Martijn van Groningen)
|
||||
|
||||
* SOLR-2382: Added pluggable cache support to DIH so that any Entity can be
|
||||
made cache-able by adding the "cacheImpl" parameter. Include
|
||||
"SortedMapBackedCache" to provide in-memory caching (as previously this was
|
||||
the only option when using CachedSqlEntityProcessor). Users can provide
|
||||
their own implementations of DIHCache for other caching strategies.
|
||||
Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with
|
||||
SqlEntityProcessor. Make SolrWriter implement DIHWriter and allow the
|
||||
possibility of pluggable Writers (DIH writing to something other than Solr).
|
||||
(James Dyer, Noble Paul)
|
||||
|
||||
|
||||
Optimizations
|
||||
----------------------
|
||||
* SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
|
||||
|
@ -1296,6 +1341,10 @@ Other Changes
|
|||
extracting request handler and are willing to use java 6, just add the jar.
|
||||
(rmuir)
|
||||
|
||||
* SOLR-3142: DIH Imports no longer default optimize to true, instead false.
|
||||
If you want to force all segments to be merged into one, you can specify
|
||||
this parameter yourself. NOTE: this can be very expensive operation and
|
||||
usually does not make sense for delta-imports. (Robert Muir)
|
||||
|
||||
Build
|
||||
----------------------
|
||||
|
@ -1393,6 +1442,9 @@ Bug Fixes
|
|||
a wrong number of collation results in the response.
|
||||
(Bastiaan Verhoef, James Dyer via Simon Willnauer)
|
||||
|
||||
* SOLR-2875: Fix the incorrect url in DIH example tika-data-config.xml
|
||||
(Shinichiro Abe via koji)
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
||||
|
@ -1585,6 +1637,24 @@ Bug Fixes
|
|||
* SOLR-2692: contrib/clustering: Typo in param name fixed: "carrot.fragzise"
|
||||
changed to "carrot.fragSize" (Stanislaw Osinski).
|
||||
|
||||
* SOLR-2644: When using DIH with threads=2 the default logging is set too high
|
||||
(Bill Bell via shalin)
|
||||
|
||||
* SOLR-2492: DIH does not commit if only deletes are processed
|
||||
(James Dyer via shalin)
|
||||
|
||||
* SOLR-2186: DataImportHandler's multi-threaded option throws NPE
|
||||
(Lance Norskog, Frank Wesemann, shalin)
|
||||
|
||||
* SOLR-2655: DIH multi threaded mode does not resolve attributes correctly
|
||||
(Frank Wesemann, shalin)
|
||||
|
||||
* SOLR-2695: DIH: Documents are collected in unsynchronized list in
|
||||
multi-threaded debug mode (Michael McCandless, shalin)
|
||||
|
||||
* SOLR-2668: DIH multithreaded mode does not rollback on errors from
|
||||
EntityProcessor (Frank Wesemann, shalin)
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
||||
|
@ -1697,6 +1767,9 @@ Bug Fixes
|
|||
* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
|
||||
(Tommaso Teofili via koji)
|
||||
|
||||
* SOLR-2551: Check dataimport.properties for write access (if delta-import is
|
||||
supported in DIH configuration) before starting an import (C S, shalin)
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
||||
|
@ -2141,6 +2214,30 @@ New Features
|
|||
|
||||
* SOLR-2237: Added StempelPolishStemFilterFactory to contrib/analysis-extras (rmuir)
|
||||
|
||||
* SOLR-1525: allow DIH to refer to core properties (noble)
|
||||
|
||||
* SOLR-1547: DIH TemplateTransformer copy objects more intelligently when the
|
||||
template is a single variable (noble)
|
||||
|
||||
* SOLR-1627: DIH VariableResolver should be fetched just in time (noble)
|
||||
|
||||
* SOLR-1583: DIH Create DataSources that return InputStream (noble)
|
||||
|
||||
* SOLR-1358: Integration of Tika and DataImportHandler (Akshay Ukey, noble)
|
||||
|
||||
* SOLR-1654: TikaEntityProcessor example added DIHExample
|
||||
(Akshay Ukey via noble)
|
||||
|
||||
* SOLR-1678: Move onError handling to DIH framework (noble)
|
||||
|
||||
* SOLR-1352: Multi-threaded implementation of DIH (noble)
|
||||
|
||||
* SOLR-1721: Add explicit option to run DataImportHandler in synchronous mode
|
||||
(Alexey Serba via noble)
|
||||
|
||||
* SOLR-1737: Added FieldStreamDataSource (noble)
|
||||
|
||||
|
||||
Optimizations
|
||||
----------------------
|
||||
|
||||
|
@ -2166,6 +2263,9 @@ Optimizations
|
|||
SolrIndexSearcher.doc(int, Set<String>) method b/c it can use the document
|
||||
cache (gsingers)
|
||||
|
||||
* SOLR-2200: Improve the performance of DataImportHandler for large
|
||||
delta-import updates. (Mark Waddle via rmuir)
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble)
|
||||
|
@ -2428,6 +2528,61 @@ Bug Fixes
|
|||
does not properly use the same iterator instance.
|
||||
(Christoph Brill, Mark Miller)
|
||||
|
||||
* SOLR-1638: Fixed NullPointerException during DIH import if uniqueKey is not
|
||||
specified in schema (Akshay Ukey via shalin)
|
||||
|
||||
* SOLR-1639: Fixed misleading error message when dataimport.properties is not
|
||||
writable (shalin)
|
||||
|
||||
* SOLR-1598: DIH: Reader used in PlainTextEntityProcessor is not explicitly
|
||||
closed (Sascha Szott via noble)
|
||||
|
||||
* SOLR-1759: DIH: $skipDoc was not working correctly
|
||||
(Gian Marco Tagliani via noble)
|
||||
|
||||
* SOLR-1762: DIH: DateFormatTransformer does not work correctly with
|
||||
non-default locale dates (tommy chheng via noble)
|
||||
|
||||
* SOLR-1757: DIH multithreading sometimes throws NPE (noble)
|
||||
|
||||
* SOLR-1766: DIH with threads enabled doesn't respond to the abort command
|
||||
(Michael Henson via noble)
|
||||
|
||||
* SOLR-1767: dataimporter.functions.escapeSql() does not escape backslash
|
||||
character (Sean Timm via noble)
|
||||
|
||||
* SOLR-1811: formatDate should use the current NOW value always
|
||||
(Sean Timm via noble)
|
||||
|
||||
* SOLR-1794: Dataimport of CLOB fields fails when getCharacterStream() is
|
||||
defined in a superclass. (Gunnar Gauslaa Bergem via rmuir)
|
||||
|
||||
* SOLR-2057: DataImportHandler never calls UpdateRequestProcessor.finish()
|
||||
(Drew Farris via koji)
|
||||
|
||||
* SOLR-1973: Empty fields in XML update messages confuse DataImportHandler.
|
||||
(koji)
|
||||
|
||||
* SOLR-2221: Use StrUtils.parseBool() to get values of boolean options in DIH.
|
||||
true/on/yes (for TRUE) and false/off/no (for FALSE) can be used for
|
||||
sub-options (debug, verbose, synchronous, commit, clean, optimize) for
|
||||
full/delta-import commands. (koji)
|
||||
|
||||
* SOLR-2310: DIH: getTimeElapsedSince() returns incorrect hour value when
|
||||
the elapse is over 60 hours (tom liu via koji)
|
||||
|
||||
* SOLR-2252: DIH: When a child entity in nested entities is rootEntity="true",
|
||||
delta-import doesn't work. (koji)
|
||||
|
||||
* SOLR-2330: solrconfig.xml files in example-DIH are broken. (Matt Parker, koji)
|
||||
|
||||
* SOLR-1191: resolve DataImportHandler deltaQuery column against pk when pk
|
||||
has a prefix (e.g. pk="book.id" deltaQuery="select id from ..."). More
|
||||
useful error reporting when no match found (previously failed with a
|
||||
NullPointerException in log and no clear user feedback). (gthb via yonik)
|
||||
|
||||
* SOLR-2116: Fix TikaConfig classloader bug in TikaEntityProcessor
|
||||
(Martijn van Groningen via hossman)
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
@ -2561,6 +2716,12 @@ Other Changes
|
|||
* SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic
|
||||
extraction (Robert Muir via gsingers)
|
||||
|
||||
* SOLR-1821: Fix TimeZone-dependent test failure in TestEvaluatorBag.
|
||||
(Chris Male via rmuir)
|
||||
|
||||
* SOLR-2367: Reduced noise in test output by ensuring the properties file
|
||||
can be written. (Gunnlaugur Thor Briem via rmuir)
|
||||
|
||||
Build
|
||||
----------------------
|
||||
|
||||
|
@ -2645,6 +2806,33 @@ error. See SOLR-1410 for more information.
|
|||
* RussianLowerCaseFilterFactory
|
||||
* RussianLetterTokenizerFactory
|
||||
|
||||
DIH: Evaluator API has been changed in a non back-compatible way. Users who
|
||||
have developed custom Evaluators will need to change their code according to
|
||||
the new API for it to work. See SOLR-996 for details.
|
||||
|
||||
DIH: The formatDate evaluator's syntax has been changed. The new syntax is
|
||||
formatDate(<variable>, '<format_string>'). For example,
|
||||
formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was
|
||||
written without a single-quotes. The old syntax has been deprecated and will
|
||||
be removed in 1.5, until then, using the old syntax will log a warning.
|
||||
|
||||
DIH: The Context API has been changed in a non back-compatible way. In
|
||||
particular, the Context.currentProcess() method now returns a String
|
||||
describing the type of the current import process instead of an int.
|
||||
Similarily, the public constants in Context viz. FULL_DUMP, DELTA_DUMP and
|
||||
FIND_DELTA are changed to a String type. See SOLR-969 for details.
|
||||
|
||||
DIH: The EntityProcessor API has been simplified by moving logic for applying
|
||||
transformers and handling multi-row outputs from Transformers into an
|
||||
EntityProcessorWrapper class. The EntityProcessor#destroy is now called once
|
||||
per parent-row at the end of row (end of data). A new method
|
||||
EntityProcessor#close is added which is called at the end of import.
|
||||
|
||||
DIH: In Solr 1.3, if the last_index_time was not available (first import) and
|
||||
a delta-import was requested, a full-import was run instead. This is no longer
|
||||
the case. In Solr 1.4 delta import is run with last_index_time as the epoch
|
||||
date (January 1, 1970, 00:00:00 GMT) if last_index_time is not available.
|
||||
|
||||
Versions of Major Components
|
||||
----------------------------
|
||||
Apache Lucene 2.9.1 (r832363 on 2.9 branch)
|
||||
|
@ -2936,6 +3124,141 @@ New Features
|
|||
86. SOLR-1274: Added text serialization output for extractOnly
|
||||
(Peter Wolanin, gsingers)
|
||||
|
||||
87. SOLR-768: DIH: Set last_index_time variable in full-import command.
|
||||
(Wojtek Piaseczny, Noble Paul via shalin)
|
||||
|
||||
88. SOLR-811: Allow a "deltaImportQuery" attribute in SqlEntityProcessor
|
||||
which is used for delta imports instead of DataImportHandler manipulating
|
||||
the SQL itself. (Noble Paul via shalin)
|
||||
|
||||
89. SOLR-842: Better error handling in DataImportHandler with options to
|
||||
abort, skip and continue imports. (Noble Paul, shalin)
|
||||
|
||||
90. SOLR-833: DIH: A DataSource to read data from a field as a reader. This
|
||||
can be used, for example, to read XMLs residing as CLOBs or BLOBs in
|
||||
databases. (Noble Paul via shalin)
|
||||
|
||||
91. SOLR-887: A DIH Transformer to strip HTML tags. (Ahmed Hammad via shalin)
|
||||
|
||||
92. SOLR-886: DataImportHandler should rollback when an import fails or it is
|
||||
aborted (shalin)
|
||||
|
||||
93. SOLR-891: A DIH Transformer to read strings from Clob type.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
94. SOLR-812: Configurable JDBC settings in JdbcDataSource including optimized
|
||||
defaults for read only mode. (David Smiley, Glen Newton, shalin)
|
||||
|
||||
95. SOLR-910: Add a few utility commands to the DIH admin page such as full
|
||||
import, delta import, status, reload config. (Ahmed Hammad via shalin)
|
||||
|
||||
96. SOLR-938: Add event listener API for DIH import start and end.
|
||||
(Kay Kay, Noble Paul via shalin)
|
||||
|
||||
97. SOLR-801: DIH: Add support for configurable pre-import and post-import
|
||||
delete query per root-entity. (Noble Paul via shalin)
|
||||
|
||||
98. SOLR-988: Add a new scope for session data stored in Context to store
|
||||
objects across imports. (Noble Paul via shalin)
|
||||
|
||||
99. SOLR-980: A PlainTextEntityProcessor which can read from any
|
||||
DataSource<Reader> and output a String.
|
||||
(Nathan Adams, Noble Paul via shalin)
|
||||
|
||||
100.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given
|
||||
xml node and its children. (Noble Paul via shalin)
|
||||
|
||||
101.SOLR-1001: Allow variables in various attributes of RegexTransformer,
|
||||
HTMLStripTransformer and NumberFormatTransformer.
|
||||
(Fergus McMenemie, Noble Paul, shalin)
|
||||
|
||||
102.SOLR-989: DIH: Expose running statistics from the Context API.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
103.SOLR-996: DIH: Expose Context to Evaluators. (Noble Paul, shalin)
|
||||
|
||||
104.SOLR-783: DIH: Enhance delta-imports by maintaining separate
|
||||
last_index_time for each entity. (Jon Baer, Noble Paul via shalin)
|
||||
|
||||
105.SOLR-1033: Current entity's namespace is made available to all DIH
|
||||
Transformers. This allows one to use an output field of TemplateTransformer
|
||||
in other transformers, among other things.
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
106.SOLR-1066: New methods in DIH Context to expose Script details.
|
||||
ScriptTransformer changed to read scripts through the new API methods.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
107.SOLR-1062: A DIH LogTransformer which can log data in a given template
|
||||
format. (Jon Baer, Noble Paul via shalin)
|
||||
|
||||
108.SOLR-1065: A DIH ContentStreamDataSource which can accept HTTP POST data
|
||||
in a content stream. This can be used to push data to Solr instead of
|
||||
just pulling it from DB/Files/URLs. (Noble Paul via shalin)
|
||||
|
||||
109.SOLR-1061: Improve DIH RegexTransformer to create multiple columns from
|
||||
regex groups. (Noble Paul via shalin)
|
||||
|
||||
110.SOLR-1059: Special DIH flags introduced for deleting documents by query or
|
||||
id, skipping rows and stopping further transforms. Use $deleteDocById,
|
||||
$deleteDocByQuery for deleting by id and query respectively. Use $skipRow
|
||||
to skip the current row but continue with the document. Use $stopTransform
|
||||
to stop further transformers. New methods are introduced in Context for
|
||||
deleting by id and query. (Noble Paul, Fergus McMenemie, shalin)
|
||||
|
||||
111.SOLR-1076: JdbcDataSource should resolve DIH variables in all its
|
||||
configuration parameters. (shalin)
|
||||
|
||||
112.SOLR-1055: Make DIH JdbcDataSource easily extensible by making the
|
||||
createConnectionFactory method protected and return a
|
||||
Callable<Connection> object. (Noble Paul, shalin)
|
||||
|
||||
113.SOLR-1058: DIH: JdbcDataSource can lookup javax.sql.DataSource using JNDI.
|
||||
Use a jndiName attribute to specify the location of the data source.
|
||||
(Jason Shepherd, Noble Paul via shalin)
|
||||
|
||||
114.SOLR-1083: A DIH Evaluator for escaping query characters.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
115.SOLR-934: A MailEntityProcessor to enable indexing mails from
|
||||
POP/IMAP sources into a solr index. (Preetam Rao, shalin)
|
||||
|
||||
116.SOLR-1060: A DIH LineEntityProcessor which can stream lines of text from a
|
||||
given file to be indexed directly or for processing with transformers and
|
||||
child entities.
|
||||
(Fergus McMenemie, Noble Paul, shalin)
|
||||
|
||||
117.SOLR-1127: Add support for DIH field name to be templatized.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
118.SOLR-1092: Added a new DIH command named 'import' which does not
|
||||
automatically clean the index. This is useful and more appropriate when one
|
||||
needs to import only some of the entities.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
119.SOLR-1153: DIH 'deltaImportQuery' is honored on child entities as well
|
||||
(noble)
|
||||
|
||||
120.SOLR-1230: Enhanced dataimport.jsp to work with all DataImportHandler
|
||||
request handler configurations, rather than just a hardcoded /dataimport
|
||||
handler. (ehatcher)
|
||||
|
||||
121.SOLR-1235: disallow period (.) in DIH entity names (noble)
|
||||
|
||||
122.SOLR-1234: Multiple DIH does not work because all of them write to
|
||||
dataimport.properties. Use the handler name as the properties file name
|
||||
(noble)
|
||||
|
||||
123.SOLR-1348: Support binary field type in convertType logic in DIH
|
||||
JdbcDataSource (shalin)
|
||||
|
||||
124.SOLR-1406: DIH: Make FileDataSource and FileListEntityProcessor to be more
|
||||
extensible (Luke Forehand, shalin)
|
||||
|
||||
125.SOLR-1437: DIH: XPathEntityProcessor can deal with xpath syntaxes such as
|
||||
//tagname , /root//tagname (Fergus McMenemie via noble)
|
||||
|
||||
|
||||
Optimizations
|
||||
----------------------
|
||||
1. SOLR-374: Use IndexReader.reopen to save resources by re-using parts of the
|
||||
|
@ -2993,6 +3316,21 @@ Optimizations
|
|||
17. SOLR-1296: Enables setting IndexReader's termInfosIndexDivisor via a new attribute to StandardIndexReaderFactory. Enables
|
||||
setting termIndexInterval to IndexWriter via SolrIndexConfig. (Jason Rutherglen, hossman, gsingers)
|
||||
|
||||
18. SOLR-846: DIH: Reduce memory consumption during delta import by removing
|
||||
keys when used (Ricky Leung, Noble Paul via shalin)
|
||||
|
||||
19. SOLR-974: DataImportHandler skips commit if no data has been updated.
|
||||
(Wojtek Piaseczny, shalin)
|
||||
|
||||
20. SOLR-1004: DIH: Check for abort more frequently during delta-imports.
|
||||
(Marc Sturlese, shalin)
|
||||
|
||||
21. SOLR-1098: DIH DateFormatTransformer can cache the format objects.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
22. SOLR-1465: Replaced string concatenations with StringBuilder append
|
||||
calls in DIH XPathRecordReader. (Mark Miller, shalin)
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
1. SOLR-774: Fixed logging level display (Sean Timm via Otis Gospodnetic)
|
||||
|
@ -3210,6 +3548,103 @@ Bug Fixes
|
|||
caused an error to be returned, although the deletes were
|
||||
still executed. (asmodean via yonik)
|
||||
|
||||
76. SOLR-800: Deep copy collections to avoid ConcurrentModificationException
|
||||
in XPathEntityprocessor while streaming
|
||||
(Kyle Morrison, Noble Paul via shalin)
|
||||
|
||||
77. SOLR-823: Request parameter variables ${dataimporter.request.xxx} are not
|
||||
resolved in DIH (Mck SembWever, Noble Paul, shalin)
|
||||
|
||||
78. SOLR-728: Add synchronization to avoid race condition of multiple DIH
|
||||
imports working concurrently (Walter Ferrara, shalin)
|
||||
|
||||
79. SOLR-742: Add ability to create dynamic fields with custom
|
||||
DataImportHandler transformers (Wojtek Piaseczny, Noble Paul, shalin)
|
||||
|
||||
80. SOLR-832: Rows parameter is not honored in DIH non-debug mode and can
|
||||
abort a running import in debug mode. (Akshay Ukey, shalin)
|
||||
|
||||
81. SOLR-838: The DIH VariableResolver obtained from a DataSource's context
|
||||
does not have current data. (Noble Paul via shalin)
|
||||
|
||||
82. SOLR-864: DataImportHandler does not catch and log Errors (shalin)
|
||||
|
||||
83. SOLR-873: Fix case-sensitive field names and columns (Jon Baer, shalin)
|
||||
|
||||
84. SOLR-893: Unable to delete documents via SQL and deletedPkQuery with
|
||||
deltaimport (Dan Rosher via shalin)
|
||||
|
||||
85. SOLR-888: DIH DateFormatTransformer cannot convert non-string type
|
||||
(Amit Nithian via shalin)
|
||||
|
||||
86. SOLR-841: DataImportHandler should throw exception if a field does not
|
||||
have column attribute (Michael Henson, shalin)
|
||||
|
||||
87. SOLR-884: CachedSqlEntityProcessor should check if the cache key is
|
||||
present in the query results (Noble Paul via shalin)
|
||||
|
||||
88. SOLR-985: Fix thread-safety issue with DIH TemplateString for concurrent
|
||||
imports with multiple cores. (Ryuuichi Kumai via shalin)
|
||||
|
||||
89. SOLR-999: DIH XPathRecordReader fails on XMLs with nodes mixed with
|
||||
CDATA content. (Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
90. SOLR-1000: DIH FileListEntityProcessor should not apply fileName filter to
|
||||
directory names. (Fergus McMenemie via shalin)
|
||||
|
||||
91. SOLR-1009: Repeated column names result in duplicate values.
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
92. SOLR-1017: Fix DIH thread-safety issue with last_index_time for concurrent
|
||||
imports in multiple cores due to unsafe usage of SimpleDateFormat by
|
||||
multiple threads. (Ryuuichi Kumai via shalin)
|
||||
|
||||
93. SOLR-1024: Calling abort on DataImportHandler import commits data instead
|
||||
of calling rollback. (shalin)
|
||||
|
||||
94. SOLR-1037: DIH should not add null values in a row returned by
|
||||
EntityProcessor to documents. (shalin)
|
||||
|
||||
95. SOLR-1040: DIH XPathEntityProcessor fails with an xpath like
|
||||
/feed/entry/link[@type='text/html']/@href (Noble Paul via shalin)
|
||||
|
||||
96. SOLR-1042: Fix memory leak in DIH by making TemplateString non-static
|
||||
member in VariableResolverImpl (Ryuuichi Kumai via shalin)
|
||||
|
||||
97. SOLR-1053: IndexOutOfBoundsException in DIH SolrWriter.getResourceAsString
|
||||
when size of data-config.xml is a multiple of 1024 bytes.
|
||||
(Herb Jiang via shalin)
|
||||
|
||||
98. SOLR-1077: IndexOutOfBoundsException with useSolrAddSchema in DIH
|
||||
XPathEntityProcessor. (Sam Keen, Noble Paul via shalin)
|
||||
|
||||
99. SOLR-1080: DIH RegexTransformer should not replace if regex is not matched.
|
||||
(Noble Paul, Fergus McMenemie via shalin)
|
||||
|
||||
100.SOLR-1090: DataImportHandler should load the data-config.xml using UTF-8
|
||||
encoding. (Rui Pereira, shalin)
|
||||
|
||||
101.SOLR-1146: ConcurrentModificationException in DataImporter.getStatusMessages
|
||||
(Walter Ferrara, Noble Paul via shalin)
|
||||
|
||||
102.SOLR-1229: Fixes for DIH deletedPkQuery, particularly when using
|
||||
transformed Solr unique id's
|
||||
(Lance Norskog, Noble Paul via ehatcher)
|
||||
|
||||
103.SOLR-1286: Fix the IH commit parameter always defaulting to "true" even
|
||||
if "false" is explicitly passed in. (Jay Hill, Noble Paul via ehatcher)
|
||||
|
||||
104.SOLR-1323: Reset XPathEntityProcessor's $hasMore/$nextUrl when fetching
|
||||
next URL (noble, ehatcher)
|
||||
|
||||
105.SOLR-1450: DIH: Jdbc connection properties such as batchSize are not
|
||||
applied if the driver jar is placed in solr_home/lib.
|
||||
(Steve Sun via shalin)
|
||||
|
||||
106.SOLR-1474: DIH Delta-import should run even if last_index_time is not set.
|
||||
(shalin)
|
||||
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
1. Upgraded to Lucene 2.4.0 (yonik)
|
||||
|
@ -3357,6 +3792,55 @@ Other Changes
|
|||
for discussion on language detection.
|
||||
See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
|
||||
|
||||
53. SOLR-782: DIH: Refactored SolrWriter to make it a concrete class and
|
||||
removed wrappers over SolrInputDocument. Refactored to load Evaluators
|
||||
lazily. Removed multiple document nodes in the configuration xml. Removed
|
||||
support for 'default' variables, they are automatically available as
|
||||
request parameters. (Noble Paul via shalin)
|
||||
|
||||
54. SOLR-964: DIH: XPathEntityProcessor now ignores DTD validations
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
55. SOLR-1029: DIH: Standardize Evaluator parameter parsing and added helper
|
||||
functions for parsing all evaluator parameters in a standard way.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
56. SOLR-1081: Change DIH EventListener to be an interface so that components
|
||||
such as an EntityProcessor or a Transformer can act as an event listener.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
57. SOLR-1027: DIH: Alias the 'dataimporter' namespace to a shorter name 'dih'.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
58. SOLR-1084: Better error reporting when DIH entity name is a reserved word
|
||||
and data-config.xml root node is not <dataConfig>.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
59. SOLR-1087: Deprecate 'where' attribute in CachedSqlEntityProcessor in
|
||||
favor of cacheKey and cacheLookup. (Noble Paul via shalin)
|
||||
|
||||
60. SOLR-969: Change the FULL_DUMP, DELTA_DUMP, FIND_DELTA constants in DIH
|
||||
Context to String. Change Context.currentProcess() to return a string
|
||||
instead of an integer. (Kay Kay, Noble Paul, shalin)
|
||||
|
||||
61. SOLR-1120: Simplified DIH EntityProcessor API by moving logic for applying
|
||||
transformers and handling multi-row outputs from Transformers into an
|
||||
EntityProcessorWrapper class. The behavior of the method
|
||||
EntityProcessor#destroy has been modified to be called once per parent-row
|
||||
at the end of row. A new method EntityProcessor#close is added which is
|
||||
called at the end of import. A new method
|
||||
Context#getResolvedEntityAttribute is added which returns the resolved
|
||||
value of an entity's attribute. Introduced a DocWrapper which takes care
|
||||
of maintaining document level session variables.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
62. SOLR-1265: Add DIH variable resolving for URLDataSource properties like
|
||||
baseUrl. (Chris Eldredge via ehatcher)
|
||||
|
||||
63. SOLR-1269: Better error messages from DIH JdbcDataSource when JDBC Driver
|
||||
name or SQL is incorrect. (ehatcher, shalin)
|
||||
|
||||
|
||||
Build
|
||||
----------------------
|
||||
1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
|
||||
|
@ -3382,6 +3866,10 @@ Documentation
|
|||
|
||||
3. SOLR-1409: Added Solr Powered By Logos
|
||||
|
||||
4. SOLR-1369: Add HSQLDB Jar to example-DIH, unzip database and update
|
||||
instructions.
|
||||
|
||||
|
||||
================== Release 1.3.0 ==================
|
||||
|
||||
Upgrading from Solr 1.2
|
||||
|
@ -3727,7 +4215,10 @@ New Features
|
|||
71. SOLR-1129 : Support binding dynamic fields to beans in SolrJ (Avlesh Singh , noble)
|
||||
|
||||
72. SOLR-920 : Cache and reuse IndexSchema . A new attribute added in solr.xml called 'shareSchema' (noble)
|
||||
|
||||
|
||||
73. SOLR-700: DIH: Allow configurable locales through a locale attribute in
|
||||
fields for NumberFormatTransformer. (Stefan Oestreicher, shalin)
|
||||
|
||||
Changes in runtime behavior
|
||||
1. SOLR-559: use Lucene updateDocument, deleteDocuments methods. This
|
||||
removes the maxBufferedDeletes parameter added by SOLR-310 as Lucene
|
||||
|
@ -3942,6 +4433,18 @@ Bug Fixes
|
|||
|
||||
50. SOLR-749: Allow QParser and ValueSourceParsers to be extended with same name (hossman, gsingers)
|
||||
|
||||
51. SOLR-704: DIH NumberFormatTransformer can silently ignore part of the
|
||||
string while parsing. Now it tries to use the complete string for parsing.
|
||||
Failure to do so will result in an exception.
|
||||
(Stefan Oestreicher via shalin)
|
||||
|
||||
52. SOLR-729: DIH Context.getDataSource(String) gives current entity's
|
||||
DataSource instance regardless of argument. (Noble Paul, shalin)
|
||||
|
||||
53. SOLR-726: DIH: Jdbc Drivers and DataSources fail to load if placed in
|
||||
multicore sharedLib or core's lib directory.
|
||||
(Walter Ferrara, Noble Paul, shalin)
|
||||
|
||||
Other Changes
|
||||
1. SOLR-135: Moved common classes to org.apache.solr.common and altered the
|
||||
build scripts to make two jars: apache-solr-1.3.jar and
|
||||
|
|
|
@ -1,547 +0,0 @@
|
|||
Apache Solr - DataImportHandler
|
||||
Release Notes
|
||||
|
||||
Introduction
|
||||
------------
|
||||
DataImportHandler is a data import tool for Solr which makes importing data from Databases, XML files and
|
||||
HTTP data sources quick and easy.
|
||||
|
||||
|
||||
$Id$
|
||||
================== 5.0.0 ==============
|
||||
|
||||
(No changes)
|
||||
|
||||
================== 4.0.0-ALPHA ==============
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-3430: Added a new test against a real SQL database. Fixed problems revealed by this new test
|
||||
related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer)
|
||||
|
||||
* SOLR-1958: When using the MailEntityProcessor, import would fail if fetchMailsSince was not specified.
|
||||
(Max Lynch via James Dyer)
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
* SOLR-3262: The "threads" feature is removed (deprecated in Solr 3.6) (James Dyer)
|
||||
|
||||
* SOLR-3422: Refactored internal data classes.
|
||||
All entities in data-config.xml must have a name (James Dyer)
|
||||
|
||||
================== 3.6.1 ==================
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-3360: More bug fixes for the deprecated "threads" parameter. (Mikhail Khludnev, Claudio R, via James Dyer)
|
||||
|
||||
* SOLR-3430: Added a new test against a real SQL database. Fixed problems revealed by this new test
|
||||
related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer)
|
||||
|
||||
* SOLR-3336: SolrEntityProcessor substitutes most variables at query time.
|
||||
(Michael Kroh, Lance Norskog, via Martijn van Groningen)
|
||||
|
||||
================== 3.6.0 ==================
|
||||
|
||||
New Features
|
||||
----------------------
|
||||
* SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core or instance based on a specified query.
|
||||
(Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna, Martijn van Groningen)
|
||||
Additional Work:
|
||||
SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency between solr parameters
|
||||
and parameters used in SolrEntityProcessor and ability to specify a custom HttpClient instance.
|
||||
(Luca Cavanna via Martijn van Groningen)
|
||||
* SOLR-2382: Added pluggable cache support so that any Entity can be made cache-able by adding the "cacheImpl" parameter.
|
||||
Include "SortedMapBackedCache" to provide in-memory caching (as previously this was the only option when
|
||||
using CachedSqlEntityProcessor). Users can provide their own implementations of DIHCache for other
|
||||
caching strategies. Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with
|
||||
SqlEntityProcessor. Make SolrWriter implement DIHWriter and allow the possibility of pluggable Writers
|
||||
(DIH writing to something other than Solr). (James Dyer, Noble Paul)
|
||||
|
||||
Changes in Runtime Behavior
|
||||
----------------------
|
||||
* SOLR-3142: Imports no longer default optimize to true, instead false. If you want to force all segments to be merged
|
||||
into one, you can specify this parameter yourself. NOTE: this can be very expensive operation and usually
|
||||
does not make sense for delta-imports. (Robert MUir)
|
||||
|
||||
================== 3.5.0 ==================
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-2875: Fix the incorrect url in tika-data-config.xml (Shinichiro Abe via koji)
|
||||
|
||||
================== 3.4.0 ==================
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-2644: When using threads=2 the default logging is set too high (Bill Bell via shalin)
|
||||
* SOLR-2492: DIH does not commit if only deletes are processed (James Dyer via shalin)
|
||||
* SOLR-2186: DataImportHandler's multi-threaded option throws NPE (Lance Norskog, Frank Wesemann, shalin)
|
||||
* SOLR-2655: DIH multi threaded mode does not resolve attributes correctly (Frank Wesemann, shalin)
|
||||
* SOLR-2695: Documents are collected in unsynchronized list in multi-threaded debug mode (Michael McCandless, shalin)
|
||||
* SOLR-2668: DIH multithreaded mode does not rollback on errors from EntityProcessor (Frank Wesemann, shalin)
|
||||
|
||||
================== 3.3.0 ==================
|
||||
|
||||
* SOLR-2551: Check dataimport.properties for write access (if delta-import is supported
|
||||
in DIH configuration) before starting an import (C S, shalin)
|
||||
|
||||
================== 3.2.0 ==================
|
||||
|
||||
(No Changes)
|
||||
|
||||
================== 3.1.0 ==================
|
||||
Upgrading from Solr 1.4
|
||||
----------------------
|
||||
|
||||
Versions of Major Components
|
||||
---------------------
|
||||
|
||||
Detailed Change List
|
||||
----------------------
|
||||
|
||||
New Features
|
||||
----------------------
|
||||
|
||||
* SOLR-1525 : allow DIH to refer to core properties (noble)
|
||||
|
||||
* SOLR-1547 : TemplateTransformer copy objects more intelligently when there when the template is a single variable (noble)
|
||||
|
||||
* SOLR-1627 : VariableResolver should be fetched just in time (noble)
|
||||
|
||||
* SOLR-1583 : Create DataSources that return InputStream (noble)
|
||||
|
||||
* SOLR-1358 : Integration of Tika and DataImportHandler ( Akshay Ukey, noble)
|
||||
|
||||
* SOLR-1654 : TikaEntityProcessor example added DIHExample (Akshay Ukey via noble)
|
||||
|
||||
* SOLR-1678 : Move onError handling to DIH framework (noble)
|
||||
|
||||
* SOLR-1352 : Multi-threaded implementation of DIH (noble)
|
||||
|
||||
* SOLR-1721 : Add explicit option to run DataImportHandler in synchronous mode (Alexey Serba via noble)
|
||||
|
||||
* SOLR-1737 : Added FieldStreamDataSource (noble)
|
||||
|
||||
Optimizations
|
||||
----------------------
|
||||
|
||||
* SOLR-2200: Improve the performance of DataImportHandler for large delta-import
|
||||
updates. (Mark Waddle via rmuir)
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
* SOLR-1638: Fixed NullPointerException during import if uniqueKey is not specified
|
||||
in schema (Akshay Ukey via shalin)
|
||||
|
||||
* SOLR-1639: Fixed misleading error message when dataimport.properties is not writable (shalin)
|
||||
|
||||
* SOLR-1598: Reader used in PlainTextEntityProcessor is not explicitly closed (Sascha Szott via noble)
|
||||
|
||||
* SOLR-1759: $skipDoc was not working correctly (Gian Marco Tagliani via noble)
|
||||
|
||||
* SOLR-1762: DateFormatTransformer does not work correctly with non-default locale dates (tommy chheng via noble)
|
||||
|
||||
* SOLR-1757: DIH multithreading sometimes throws NPE (noble)
|
||||
|
||||
* SOLR-1766: DIH with threads enabled doesn't respond to the abort command (Michael Henson via noble)
|
||||
|
||||
* SOLR-1767: dataimporter.functions.escapeSql() does not escape backslash character (Sean Timm via noble)
|
||||
|
||||
* SOLR-1811: formatDate should use the current NOW value always (Sean Timm via noble)
|
||||
|
||||
* SOLR-1794: Dataimport of CLOB fields fails when getCharacterStream() is
|
||||
defined in a superclass. (Gunnar Gauslaa Bergem via rmuir)
|
||||
|
||||
* SOLR-2057: DataImportHandler never calls UpdateRequestProcessor.finish()
|
||||
(Drew Farris via koji)
|
||||
|
||||
* SOLR-1973: Empty fields in XML update messages confuse DataImportHandler. (koji)
|
||||
|
||||
* SOLR-2221: Use StrUtils.parseBool() to get values of boolean options in DIH.
|
||||
true/on/yes (for TRUE) and false/off/no (for FALSE) can be used for sub-options
|
||||
(debug, verbose, synchronous, commit, clean, optimize) for full/delta-import commands. (koji)
|
||||
|
||||
* SOLR-2310: getTimeElapsedSince() returns incorrect hour value when the elapse is over 60 hours
|
||||
(tom liu via koji)
|
||||
|
||||
* SOLR-2252: When a child entity in nested entities is rootEntity="true", delta-import doesn't work.
|
||||
(koji)
|
||||
|
||||
* SOLR-2330: solrconfig.xml files in example-DIH are broken. (Matt Parker, koji)
|
||||
|
||||
* SOLR-1191: resolve DataImportHandler deltaQuery column against pk when pk
|
||||
has a prefix (e.g. pk="book.id" deltaQuery="select id from ..."). More
|
||||
useful error reporting when no match found (previously failed with a
|
||||
NullPointerException in log and no clear user feedback). (gthb via yonik)
|
||||
|
||||
* SOLR-2116: Fix TikaConfig classloader bug in TikaEntityProcessor
|
||||
(Martijn van Groningen via hossman)
|
||||
|
||||
|
||||
Other Changes
|
||||
----------------------
|
||||
|
||||
* SOLR-1821: Fix TimeZone-dependent test failure in TestEvaluatorBag.
|
||||
(Chris Male via rmuir)
|
||||
|
||||
* SOLR-2367: Reduced noise in test output by ensuring the properties file can be written.
|
||||
(Gunnlaugur Thor Briem via rmuir)
|
||||
|
||||
|
||||
Build
|
||||
----------------------
|
||||
|
||||
|
||||
Documentation
|
||||
----------------------
|
||||
|
||||
================== Release 1.4.0 ==================
|
||||
|
||||
Upgrading from Solr 1.3
|
||||
-----------------------
|
||||
|
||||
Evaluator API has been changed in a non back-compatible way. Users who have developed custom Evaluators will need
|
||||
to change their code according to the new API for it to work. See SOLR-996 for details.
|
||||
|
||||
The formatDate evaluator's syntax has been changed. The new syntax is formatDate(<variable>, '<format_string>').
|
||||
For example, formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was written without a single-quotes.
|
||||
The old syntax has been deprecated and will be removed in 1.5, until then, using the old syntax will log a warning.
|
||||
|
||||
The Context API has been changed in a non back-compatible way. In particular, the Context.currentProcess() method
|
||||
now returns a String describing the type of the current import process instead of an int. Similarily, the public
|
||||
constants in Context viz. FULL_DUMP, DELTA_DUMP and FIND_DELTA are changed to a String type. See SOLR-969 for details.
|
||||
|
||||
The EntityProcessor API has been simplified by moving logic for applying transformers and handling multi-row outputs
|
||||
from Transformers into an EntityProcessorWrapper class. The EntityProcessor#destroy is now called once per
|
||||
parent-row at the end of row (end of data). A new method EntityProcessor#close is added which is called at the end
|
||||
of import.
|
||||
|
||||
In Solr 1.3, if the last_index_time was not available (first import) and a delta-import was requested, a full-import
|
||||
was run instead. This is no longer the case. In Solr 1.4 delta import is run with last_index_time as the epoch
|
||||
date (January 1, 1970, 00:00:00 GMT) if last_index_time is not available.
|
||||
|
||||
Detailed Change List
|
||||
----------------------
|
||||
|
||||
New Features
|
||||
----------------------
|
||||
1. SOLR-768: Set last_index_time variable in full-import command.
|
||||
(Wojtek Piaseczny, Noble Paul via shalin)
|
||||
|
||||
2. SOLR-811: Allow a "deltaImportQuery" attribute in SqlEntityProcessor which is used for delta imports
|
||||
instead of DataImportHandler manipulating the SQL itself.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
3. SOLR-842: Better error handling in DataImportHandler with options to abort, skip and continue imports.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
4. SOLR-833: A DataSource to read data from a field as a reader. This can be used, for example, to read XMLs
|
||||
residing as CLOBs or BLOBs in databases.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
5. SOLR-887: A Transformer to strip HTML tags.
|
||||
(Ahmed Hammad via shalin)
|
||||
|
||||
6. SOLR-886: DataImportHandler should rollback when an import fails or it is aborted
|
||||
(shalin)
|
||||
|
||||
7. SOLR-891: A Transformer to read strings from Clob type.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
8. SOLR-812: Configurable JDBC settings in JdbcDataSource including optimized defaults for read only mode.
|
||||
(David Smiley, Glen Newton, shalin)
|
||||
|
||||
9. SOLR-910: Add a few utility commands to the DIH admin page such as full import, delta import, status, reload config.
|
||||
(Ahmed Hammad via shalin)
|
||||
|
||||
10.SOLR-938: Add event listener API for import start and end.
|
||||
(Kay Kay, Noble Paul via shalin)
|
||||
|
||||
11.SOLR-801: Add support for configurable pre-import and post-import delete query per root-entity.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
12.SOLR-988: Add a new scope for session data stored in Context to store objects across imports.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
13.SOLR-980: A PlainTextEntityProcessor which can read from any DataSource<Reader> and output a String.
|
||||
(Nathan Adams, Noble Paul via shalin)
|
||||
|
||||
14.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given xml node and its children.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
15.SOLR-1001: Allow variables in various attributes of RegexTransformer, HTMLStripTransformer
|
||||
and NumberFormatTransformer.
|
||||
(Fergus McMenemie, Noble Paul, shalin)
|
||||
|
||||
16.SOLR-989: Expose running statistics from the Context API.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
17.SOLR-996: Expose Context to Evaluators.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
18.SOLR-783: Enhance delta-imports by maintaining separate last_index_time for each entity.
|
||||
(Jon Baer, Noble Paul via shalin)
|
||||
|
||||
19.SOLR-1033: Current entity's namespace is made available to all Transformers. This allows one to use an output field
|
||||
of TemplateTransformer in other transformers, among other things.
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
20.SOLR-1066: New methods in Context to expose Script details. ScriptTransformer changed to read scripts
|
||||
through the new API methods.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
21.SOLR-1062: A LogTransformer which can log data in a given template format.
|
||||
(Jon Baer, Noble Paul via shalin)
|
||||
|
||||
22.SOLR-1065: A ContentStreamDataSource which can accept HTTP POST data in a content stream. This can be used to
|
||||
push data to Solr instead of just pulling it from DB/Files/URLs.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
23.SOLR-1061: Improve RegexTransformer to create multiple columns from regex groups.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
24.SOLR-1059: Special flags introduced for deleting documents by query or id, skipping rows and stopping further
|
||||
transforms. Use $deleteDocById, $deleteDocByQuery for deleting by id and query respectively.
|
||||
Use $skipRow to skip the current row but continue with the document. Use $stopTransform to stop
|
||||
further transformers. New methods are introduced in Context for deleting by id and query.
|
||||
(Noble Paul, Fergus McMenemie, shalin)
|
||||
|
||||
25.SOLR-1076: JdbcDataSource should resolve variables in all its configuration parameters.
|
||||
(shalin)
|
||||
|
||||
26.SOLR-1055: Make DIH JdbcDataSource easily extensible by making the createConnectionFactory method protected and
|
||||
return a Callable<Connection> object.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
27.SOLR-1058: JdbcDataSource can lookup javax.sql.DataSource using JNDI. Use a jndiName attribute to specify the
|
||||
location of the data source.
|
||||
(Jason Shepherd, Noble Paul via shalin)
|
||||
|
||||
28.SOLR-1083: An Evaluator for escaping query characters.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
29.SOLR-934: A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index.
|
||||
(Preetam Rao, shalin)
|
||||
|
||||
30.SOLR-1060: A LineEntityProcessor which can stream lines of text from a given file to be indexed directly or
|
||||
for processing with transformers and child entities.
|
||||
(Fergus McMenemie, Noble Paul, shalin)
|
||||
|
||||
31.SOLR-1127: Add support for field name to be templatized.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
32.SOLR-1092: Added a new command named 'import' which does not automatically clean the index. This is useful and
|
||||
more appropriate when one needs to import only some of the entities.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
33.SOLR-1153: 'deltaImportQuery' is honored on child entities as well (noble)
|
||||
|
||||
34.SOLR-1230: Enhanced dataimport.jsp to work with all DataImportHandler request handler configurations,
|
||||
rather than just a hardcoded /dataimport handler. (ehatcher)
|
||||
|
||||
35.SOLR-1235: disallow period (.) in entity names (noble)
|
||||
|
||||
36.SOLR-1234: Multiple DIH does not work because all of them write to dataimport.properties.
|
||||
Use the handler name as the properties file name (noble)
|
||||
|
||||
37.SOLR-1348: Support binary field type in convertType logic in JdbcDataSource (shalin)
|
||||
|
||||
38.SOLR-1406: Make FileDataSource and FileListEntityProcessor to be more extensible (Luke Forehand, shalin)
|
||||
|
||||
39.SOLR-1437 : XPathEntityProcessor can deal with xpath syntaxes such as //tagname , /root//tagname (Fergus McMenemie via noble)
|
||||
|
||||
Optimizations
|
||||
----------------------
|
||||
1. SOLR-846: Reduce memory consumption during delta import by removing keys when used
|
||||
(Ricky Leung, Noble Paul via shalin)
|
||||
|
||||
2. SOLR-974: DataImportHandler skips commit if no data has been updated.
|
||||
(Wojtek Piaseczny, shalin)
|
||||
|
||||
3. SOLR-1004: Check for abort more frequently during delta-imports.
|
||||
(Marc Sturlese, shalin)
|
||||
|
||||
4. SOLR-1098: DateFormatTransformer can cache the format objects.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
5. SOLR-1465: Replaced string concatenations with StringBuilder append calls in XPathRecordReader.
|
||||
(Mark Miller, shalin)
|
||||
|
||||
|
||||
Bug Fixes
|
||||
----------------------
|
||||
1. SOLR-800: Deep copy collections to avoid ConcurrentModificationException in XPathEntityprocessor while streaming
|
||||
(Kyle Morrison, Noble Paul via shalin)
|
||||
|
||||
2. SOLR-823: Request parameter variables ${dataimporter.request.xxx} are not resolved
|
||||
(Mck SembWever, Noble Paul, shalin)
|
||||
|
||||
3. SOLR-728: Add synchronization to avoid race condition of multiple imports working concurrently
|
||||
(Walter Ferrara, shalin)
|
||||
|
||||
4. SOLR-742: Add ability to create dynamic fields with custom DataImportHandler transformers
|
||||
(Wojtek Piaseczny, Noble Paul, shalin)
|
||||
|
||||
5. SOLR-832: Rows parameter is not honored in non-debug mode and can abort a running import in debug mode.
|
||||
(Akshay Ukey, shalin)
|
||||
|
||||
6. SOLR-838: The VariableResolver obtained from a DataSource's context does not have current data.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
7. SOLR-864: DataImportHandler does not catch and log Errors (shalin)
|
||||
|
||||
8. SOLR-873: Fix case-sensitive field names and columns (Jon Baer, shalin)
|
||||
|
||||
9. SOLR-893: Unable to delete documents via SQL and deletedPkQuery with deltaimport
|
||||
(Dan Rosher via shalin)
|
||||
|
||||
10. SOLR-888: DateFormatTransformer cannot convert non-string type
|
||||
(Amit Nithian via shalin)
|
||||
|
||||
11. SOLR-841: DataImportHandler should throw exception if a field does not have column attribute
|
||||
(Michael Henson, shalin)
|
||||
|
||||
12. SOLR-884: CachedSqlEntityProcessor should check if the cache key is present in the query results
|
||||
(Noble Paul via shalin)
|
||||
|
||||
13. SOLR-985: Fix thread-safety issue with TemplateString for concurrent imports with multiple cores.
|
||||
(Ryuuichi Kumai via shalin)
|
||||
|
||||
14. SOLR-999: XPathRecordReader fails on XMLs with nodes mixed with CDATA content.
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
15.SOLR-1000: FileListEntityProcessor should not apply fileName filter to directory names.
|
||||
(Fergus McMenemie via shalin)
|
||||
|
||||
16.SOLR-1009: Repeated column names result in duplicate values.
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
17.SOLR-1017: Fix thread-safety issue with last_index_time for concurrent imports in multiple cores due to unsafe usage
|
||||
of SimpleDateFormat by multiple threads.
|
||||
(Ryuuichi Kumai via shalin)
|
||||
|
||||
18.SOLR-1024: Calling abort on DataImportHandler import commits data instead of calling rollback.
|
||||
(shalin)
|
||||
|
||||
19.SOLR-1037: DIH should not add null values in a row returned by EntityProcessor to documents.
|
||||
(shalin)
|
||||
|
||||
20.SOLR-1040: XPathEntityProcessor fails with an xpath like /feed/entry/link[@type='text/html']/@href
|
||||
(Noble Paul via shalin)
|
||||
|
||||
21.SOLR-1042: Fix memory leak in DIH by making TemplateString non-static member in VariableResolverImpl
|
||||
(Ryuuichi Kumai via shalin)
|
||||
|
||||
22.SOLR-1053: IndexOutOfBoundsException in SolrWriter.getResourceAsString when size of data-config.xml is a
|
||||
multiple of 1024 bytes.
|
||||
(Herb Jiang via shalin)
|
||||
|
||||
23.SOLR-1077: IndexOutOfBoundsException with useSolrAddSchema in XPathEntityProcessor.
|
||||
(Sam Keen, Noble Paul via shalin)
|
||||
|
||||
24.SOLR-1080: RegexTransformer should not replace if regex is not matched.
|
||||
(Noble Paul, Fergus McMenemie via shalin)
|
||||
|
||||
25.SOLR-1090: DataImportHandler should load the data-config.xml using UTF-8 encoding.
|
||||
(Rui Pereira, shalin)
|
||||
|
||||
26.SOLR-1146: ConcurrentModificationException in DataImporter.getStatusMessages
|
||||
(Walter Ferrara, Noble Paul via shalin)
|
||||
|
||||
27.SOLR-1229: Fixes for deletedPkQuery, particularly when using transformed Solr unique id's
|
||||
(Lance Norskog, Noble Paul via ehatcher)
|
||||
|
||||
28.SOLR-1286: Fix the commit parameter always defaulting to "true" even if "false" is explicitly passed in.
|
||||
(Jay Hill, Noble Paul via ehatcher)
|
||||
|
||||
29.SOLR-1323: Reset XPathEntityProcessor's $hasMore/$nextUrl when fetching next URL (noble, ehatcher)
|
||||
|
||||
30.SOLR-1450: Jdbc connection properties such as batchSize are not applied if the driver jar is placed
|
||||
in solr_home/lib.
|
||||
(Steve Sun via shalin)
|
||||
|
||||
31.SOLR-1474: Delta-import should run even if last_index_time is not set.
|
||||
(shalin)
|
||||
|
||||
|
||||
Documentation
|
||||
----------------------
|
||||
1. SOLR-1369: Add HSQLDB Jar to example-DIH, unzip database and update instructions.
|
||||
|
||||
Other
|
||||
----------------------
|
||||
1. SOLR-782: Refactored SolrWriter to make it a concrete class and removed wrappers over SolrInputDocument.
|
||||
Refactored to load Evaluators lazily. Removed multiple document nodes in the configuration xml.
|
||||
Removed support for 'default' variables, they are automatically available as request parameters.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
2. SOLR-964: XPathEntityProcessor now ignores DTD validations
|
||||
(Fergus McMenemie, Noble Paul via shalin)
|
||||
|
||||
3. SOLR-1029: Standardize Evaluator parameter parsing and added helper functions for parsing all evaluator
|
||||
parameters in a standard way.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
4. SOLR-1081: Change EventListener to be an interface so that components such as an EntityProcessor or a Transformer
|
||||
can act as an event listener.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
5. SOLR-1027: Alias the 'dataimporter' namespace to a shorter name 'dih'.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
6. SOLR-1084: Better error reporting when entity name is a reserved word and data-config.xml root node
|
||||
is not <dataConfig>.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
7. SOLR-1087: Deprecate 'where' attribute in CachedSqlEntityProcessor in favor of cacheKey and cacheLookup.
|
||||
(Noble Paul via shalin)
|
||||
|
||||
8. SOLR-969: Change the FULL_DUMP, DELTA_DUMP, FIND_DELTA constants in Context to String.
|
||||
Change Context.currentProcess() to return a string instead of an integer.
|
||||
(Kay Kay, Noble Paul, shalin)
|
||||
|
||||
9. SOLR-1120: Simplified EntityProcessor API by moving logic for applying transformers and handling multi-row outputs
|
||||
from Transformers into an EntityProcessorWrapper class. The behavior of the method
|
||||
EntityProcessor#destroy has been modified to be called once per parent-row at the end of row. A new
|
||||
method EntityProcessor#close is added which is called at the end of import. A new method
|
||||
Context#getResolvedEntityAttribute is added which returns the resolved value of an entity's attribute.
|
||||
Introduced a DocWrapper which takes care of maintaining document level session variables.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
10.SOLR-1265: Add variable resolving for URLDataSource properties like baseUrl. (Chris Eldredge via ehatcher)
|
||||
|
||||
11.SOLR-1269: Better error messages from JdbcDataSource when JDBC Driver name or SQL is incorrect.
|
||||
(ehatcher, shalin)
|
||||
|
||||
================== Release 1.3.0 ==================
|
||||
|
||||
Status
|
||||
------
|
||||
This is the first release since DataImportHandler was added to the contrib solr distribution.
|
||||
The following changes list changes since the code was introduced, not since
|
||||
the first official release.
|
||||
|
||||
|
||||
Detailed Change List
|
||||
--------------------
|
||||
|
||||
New Features
|
||||
1. SOLR-700: Allow configurable locales through a locale attribute in fields for NumberFormatTransformer.
|
||||
(Stefan Oestreicher, shalin)
|
||||
|
||||
Changes in runtime behavior
|
||||
|
||||
Bug Fixes
|
||||
1. SOLR-704: NumberFormatTransformer can silently ignore part of the string while parsing. Now it tries to
|
||||
use the complete string for parsing. Failure to do so will result in an exception.
|
||||
(Stefan Oestreicher via shalin)
|
||||
|
||||
2. SOLR-729: Context.getDataSource(String) gives current entity's DataSource instance regardless of argument.
|
||||
(Noble Paul, shalin)
|
||||
|
||||
3. SOLR-726: Jdbc Drivers and DataSources fail to load if placed in multicore sharedLib or core's lib directory.
|
||||
(Walter Ferrara, Noble Paul, shalin)
|
||||
|
||||
Other Changes
|
||||
|
||||
|
|
@ -1,3 +1,12 @@
|
|||
Apache Solr - DataImportHandler
|
||||
|
||||
Introduction
|
||||
------------
|
||||
DataImportHandler is a data import tool for Solr which makes importing data from Databases, XML files and
|
||||
HTTP data sources quick and easy.
|
||||
|
||||
Important Note
|
||||
--------------
|
||||
Although Solr strives to be agnostic of the Locale where the server is
|
||||
running, some code paths in DataImportHandler are known to depend on the
|
||||
System default Locale, Timezone, or Charset. It is recommended that when
|
||||
|
|
Loading…
Reference in New Issue