2007-03-07 22:37:12 -05:00
|
|
|
Lucene Change Log
|
2001-11-04 12:23:04 -05:00
|
|
|
|
|
|
|
$Id$
|
|
|
|
|
2007-02-14 12:37:06 -05:00
|
|
|
======================= Trunk (not yet released) =======================
|
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
API Changes
|
|
|
|
|
2007-02-21 15:01:36 -05:00
|
|
|
1. LUCENE-793: created new exceptions and added them to throws clause
|
|
|
|
for many methods (all subclasses of IOException for backwards
|
|
|
|
compatibility): index.StaleReaderException,
|
|
|
|
index.CorruptIndexException, store.LockObtainFailedException.
|
|
|
|
This was done to better call out the possible root causes of an
|
|
|
|
IOException from these methods. (Mike McCandless)
|
|
|
|
|
2007-02-24 19:43:45 -05:00
|
|
|
2. LUCENE-811: make SegmentInfos class, plus a few methods from related
|
|
|
|
classes, package-private again (they were unnecessarily made public
|
|
|
|
as part of LUCENE-701). (Mike McCandless)
|
|
|
|
|
2007-03-13 05:06:22 -04:00
|
|
|
3. LUCENE-710: added optional autoCommit boolean to IndexWriter
|
|
|
|
constructors. When this is false, index changes are not committed
|
|
|
|
until the writer is closed. This gives explicit control over when
|
|
|
|
a reader will see the changes. Also added optional custom
|
|
|
|
deletion policy to explicitly control when prior commits are
|
|
|
|
removed from the index. This is intended to allow applications to
|
|
|
|
share an index over NFS by customizing when prior commits are
|
|
|
|
deleted. (Mike McCandless)
|
|
|
|
|
2007-03-14 14:46:03 -04:00
|
|
|
4. LUCENE-818: changed most public methods of IndexWriter,
|
|
|
|
IndexReader (and its subclasses), FieldsReader and RAMDirectory to
|
|
|
|
throw AlreadyClosedException if they are accessed after being
|
|
|
|
closed. (Mike McCandless)
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
5. LUCENE-834: Changed some access levels for certain Span classes to allow them
|
|
|
|
to be overridden. They have been marked expert only and not for public
|
|
|
|
consumption. (Grant Ingersoll)
|
2007-03-28 08:58:15 -04:00
|
|
|
|
2007-04-04 22:06:46 -04:00
|
|
|
6. LUCENE-796: Removed calls to super.* from various get*Query methods in
|
|
|
|
MultiFieldQueryParser, in order to allow sub-classes to override them.
|
|
|
|
(Steven Parkes via Otis Gospodnetic)
|
|
|
|
|
2007-04-10 16:42:47 -04:00
|
|
|
7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
|
|
|
|
in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
|
|
|
|
combination when caching is desired.
|
|
|
|
(Chris Hostetter, Otis Gospodnetic)
|
|
|
|
|
2007-04-25 04:46:14 -04:00
|
|
|
8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
|
2007-04-29 15:26:11 -04:00
|
|
|
to enable extensibility of these classes. (Michael Busch)
|
|
|
|
|
|
|
|
9. LUCENE-580: Added the public method reset() to TokenStream. This method does
|
|
|
|
nothing by default, but may be overwritten by subclasses to support consuming
|
|
|
|
the TokenStream more than once. (Michael Busch)
|
|
|
|
|
|
|
|
10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
|
|
|
|
argument, available as tokenStreamValue(). This is useful to avoid the need of
|
|
|
|
"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
|
|
|
|
|
2007-05-28 15:33:10 -04:00
|
|
|
11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
|
|
|
|
getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
|
|
|
|
getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
|
|
|
|
improves performance for certain queries but results in scoring out of docid
|
|
|
|
order. This patch reverse this change, so now by default hit docs are scored
|
|
|
|
in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
|
|
|
|
This patch also enables the tests in QueryUtils again that check for docid
|
|
|
|
order. (Paul Elschot, Doron Cohen, Michael Busch)
|
|
|
|
|
2007-05-29 11:14:07 -04:00
|
|
|
12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
|
|
|
|
to optionally specify the size of the read buffer. Also added
|
|
|
|
BufferedIndexInput.setBufferSize(int) to change the buffer size.
|
|
|
|
(Mike McCandless)
|
|
|
|
|
2007-04-25 04:46:14 -04:00
|
|
|
|
2007-02-14 12:37:06 -05:00
|
|
|
Bug fixes
|
|
|
|
|
2007-02-22 18:45:34 -05:00
|
|
|
1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
|
|
|
|
|
2007-02-27 14:26:57 -05:00
|
|
|
2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
|
|
|
|
Query parser modified to create a prefix query only for the case
|
|
|
|
that there is a single trailing wildcard (and no additional wildcard
|
|
|
|
or '?' in the query text). (Doron Cohen)
|
|
|
|
|
2007-02-27 18:33:31 -05:00
|
|
|
3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
|
|
|
|
and SimpleFSLockFactory. This enables all 4 builtin LockFactory
|
|
|
|
implementations to be specified via the System property
|
|
|
|
org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
|
2007-03-03 22:23:00 -05:00
|
|
|
|
|
|
|
4. LUCENE-821: The new single-norm-file introduded by LUCENE-756
|
|
|
|
failed to reduce the number of open descriptors since it was still
|
|
|
|
opened once per field with norms. (yonik)
|
2007-02-27 18:33:31 -05:00
|
|
|
|
2007-03-06 16:40:55 -05:00
|
|
|
5. LUCENE-823: Make sure internal file handles are closed when
|
|
|
|
hitting an exception (eg disk full) while flushing deletes in
|
|
|
|
IndexWriter's mergeSegments, and also during
|
|
|
|
IndexWriter.addIndexes. (Mike McCandless)
|
|
|
|
|
2007-03-07 03:45:58 -05:00
|
|
|
6. LUCENE-825: If directory is removed after
|
|
|
|
FSDirectory.getDirectory() but before IndexReader.open you now get
|
|
|
|
a FileNotFoundException like Lucene pre-2.1 (before this fix you
|
|
|
|
got an NPE). (Mike McCandless)
|
|
|
|
|
2007-03-07 22:37:12 -05:00
|
|
|
7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
|
|
|
|
because the backslash is the escape character. Also changed the ESCAPED_CHAR
|
|
|
|
list to contain all possible characters, because every character that
|
|
|
|
follows a backslash should be considered as escaped. (Michael Busch)
|
|
|
|
|
2007-03-07 23:18:56 -05:00
|
|
|
8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
|
|
|
|
is consumed. Now a ParseException is thrown if a query contains too many
|
|
|
|
closing parentheses. (Andreas Neumann via Michael Busch)
|
|
|
|
|
2007-03-13 16:50:56 -04:00
|
|
|
9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
|
|
|
|
Now also deleting all javacc generated files before calling javacc.
|
|
|
|
(Steven Parkes, Doron Cohen)
|
2007-03-16 02:47:44 -04:00
|
|
|
|
|
|
|
10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
|
2007-03-13 16:50:56 -04:00
|
|
|
|
2007-03-16 11:21:05 -04:00
|
|
|
11. LUCENE-828: Minor fix for Term's equal().
|
|
|
|
(Paul Cowan via Otis Gospodnetic)
|
|
|
|
|
2007-03-25 09:18:55 -04:00
|
|
|
12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
|
|
|
|
and you call addIndexes, and hit an exception (eg disk full) then
|
|
|
|
when IndexWriter rolls back its internal state this could corrupt
|
|
|
|
the instance of IndexWriter (but, not the index itself) by
|
|
|
|
referencing already deleted segments. This bug was only present
|
|
|
|
in 2.2 (trunk), ie was never released. (Mike McCandless)
|
2007-04-24 01:32:47 -04:00
|
|
|
|
|
|
|
13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
|
|
|
|
For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
|
|
|
|
|
2007-04-25 17:10:43 -04:00
|
|
|
14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
|
|
|
|
by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
|
|
|
|
Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
|
|
|
|
was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
|
|
|
|
designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
|
2007-03-25 09:18:55 -04:00
|
|
|
|
2007-05-17 08:38:43 -04:00
|
|
|
15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
|
|
|
|
has written the postings. Then the resources associated with the
|
|
|
|
TokenStreams can safely be released. (Michael Busch)
|
|
|
|
|
2007-05-19 07:27:54 -04:00
|
|
|
16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
|
|
|
|
won't insert terms twice anymore. (Daniel Naber)
|
|
|
|
|
2007-05-23 00:54:38 -04:00
|
|
|
17. LUCENE-881: QueryParser.escape() now also escapes the characters
|
|
|
|
'|' and '&' which are part of the queryparser syntax. (Michael Busch)
|
|
|
|
|
2007-05-26 07:06:38 -04:00
|
|
|
18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
|
|
|
|
anymore and ignored, but re-thrown. Some javadoc improvements.
|
|
|
|
(Daniel Naber)
|
|
|
|
|
2007-05-30 19:09:25 -04:00
|
|
|
19. LUCENE-698: FilteredQuery now takes the query boost into account for
|
|
|
|
scoring. (Michael Busch)
|
|
|
|
|
2007-05-31 15:13:36 -04:00
|
|
|
20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
|
|
|
|
enumeration. (Christian Mallwitz via Daniel Naber)
|
2007-06-04 16:41:06 -04:00
|
|
|
|
|
|
|
21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
|
|
|
|
Explanation tests now "deep" check the explanation details.
|
|
|
|
(Chris Hostetter, Doron Cohen)
|
2007-05-31 15:13:36 -04:00
|
|
|
|
2007-03-02 13:19:53 -05:00
|
|
|
New features
|
|
|
|
|
|
|
|
1. LUCENE-759: Added two n-gram-producing TokenFilters.
|
|
|
|
(Otis Gospodnetic)
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
|
|
|
|
RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
|
2007-03-05 09:28:01 -05:00
|
|
|
|
2007-03-15 01:15:43 -04:00
|
|
|
3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
|
|
|
|
These metadata are called Payloads. For every position of a Token one Payload in the form
|
|
|
|
of a variable length byte array can be stored in the prox file.
|
|
|
|
Remark: The APIs introduced with this feature are in experimental state and thus
|
|
|
|
contain appropriate warnings in the javadocs.
|
|
|
|
(Michael Busch)
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
|
|
|
|
values of a payload (see #3 above.) (Grant Ingersoll)
|
2007-04-09 13:13:44 -04:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
5. LUCENE-834: Similarity has a new method for scoring payloads called
|
|
|
|
scorePayloads that can be overridden to take advantage of payload
|
|
|
|
storage (see #3 above)
|
2007-04-09 13:13:44 -04:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
|
|
|
|
implemented it in the appropriate places (Grant Ingersoll)
|
2007-03-28 08:58:15 -04:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
|
|
|
|
on the remote side of the RMI connection.
|
2007-04-09 13:13:44 -04:00
|
|
|
(Matt Ericson via Otis Gospodnetic)
|
2007-03-28 08:58:15 -04:00
|
|
|
|
2007-06-05 12:29:35 -04:00
|
|
|
8. LUCENE-446: Added Solr's search.function for scores based on field
|
|
|
|
values, plus CustomScoreQuery for simple score (post) customization.
|
|
|
|
(Yonik Seeley, Doron Cohen)
|
|
|
|
|
2007-02-14 12:37:06 -05:00
|
|
|
Optimizations
|
|
|
|
|
2007-03-11 03:06:40 -04:00
|
|
|
1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
|
|
|
|
when nextPosition() is called for the first time. This allows using instances
|
|
|
|
of SegmentTermPositions instead of SegmentTermDocs without additional costs.
|
|
|
|
(Michael Busch)
|
2007-03-09 17:29:00 -05:00
|
|
|
|
2007-04-17 16:08:41 -04:00
|
|
|
2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
|
|
|
|
IndexOutput directly now. This avoids further buffering and thus avoids
|
|
|
|
unneccessary array copies. (Michael Busch)
|
|
|
|
|
2007-04-17 20:38:50 -04:00
|
|
|
3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
|
|
|
|
cases and possibly improve scoring performance. Documents can now be
|
|
|
|
delivered out-of-order as they are scored (e.g. to HitCollector).
|
|
|
|
N.B. A bit of code had to be disabled in QueryUtils in order for
|
|
|
|
TestBoolean2 test to keep passing.
|
2007-04-17 18:00:07 -04:00
|
|
|
(Paul Elschot via Otis Gospodnetic)
|
|
|
|
|
2007-05-19 07:04:38 -04:00
|
|
|
4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
|
|
|
|
them to keep the spell index small. (Daniel Naber)
|
2007-05-24 15:42:47 -04:00
|
|
|
|
|
|
|
5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
|
|
|
|
Together with LUCENE-888 this will allow to adjust the buffer size
|
|
|
|
dynamically. (Paul Elschot, Michael Busch)
|
2007-05-19 07:04:38 -04:00
|
|
|
|
2007-05-29 11:14:07 -04:00
|
|
|
6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
|
|
|
|
BufferedIndexOutput. Also increase buffer size in
|
|
|
|
BufferedIndexInput, but only when used during merging. Together,
|
|
|
|
these increases yield 10-18% overall performance gain vs the
|
|
|
|
previous 1K defaults. (Mike McCandless)
|
2007-05-31 03:48:53 -04:00
|
|
|
|
|
|
|
7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
|
|
|
|
up most queries that use skipTo(), especially on big indexes with large posting
|
|
|
|
lists. For average AND queries the speedup is about 20%, for queries that
|
2007-05-31 03:54:29 -04:00
|
|
|
contain very frequent and very unique terms the speedup can be over 80%.
|
2007-05-31 03:48:53 -04:00
|
|
|
(Michael Busch)
|
2007-05-29 11:14:07 -04:00
|
|
|
|
2007-05-19 07:04:38 -04:00
|
|
|
Documentation
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
|
|
|
|
http://wiki.apache.org/lucene-java/ Updated the links in the docs and
|
|
|
|
wherever else I found references. (Grant Ingersoll, Joe Schaefer)
|
2007-03-15 04:48:00 -04:00
|
|
|
|
2007-05-28 12:39:29 -04:00
|
|
|
2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
|
|
|
|
consistent with java.util.Comparator.compare(): Any integer is allowed to
|
|
|
|
be returned instead of only -1/0/1.
|
2007-03-15 04:48:00 -04:00
|
|
|
(Paul Cowan via Michael Busch)
|
2007-05-05 01:36:49 -04:00
|
|
|
|
|
|
|
3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
|
|
|
|
Solved javadoc errors under jdk5 (jars in path for gdata).
|
2007-05-10 13:42:09 -04:00
|
|
|
Made "javadocs" target depend on "build-contrib" for first downloading
|
|
|
|
contrib jars configured for dynamic downloaded. (Note: when running
|
|
|
|
behind firewall, a firewall prompt might pop up) (Doron Cohen)
|
2007-03-15 04:48:00 -04:00
|
|
|
|
2007-06-05 21:33:11 -04:00
|
|
|
4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
|
|
|
|
remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
|
|
|
|
|
2007-05-28 12:32:29 -04:00
|
|
|
Build
|
|
|
|
|
|
|
|
1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
|
|
|
|
(Steven Parkes via Michael Busch)
|
|
|
|
|
2007-05-30 03:18:29 -04:00
|
|
|
2. LUCENE-885: "ant test" now includes all contrib tests. The new
|
|
|
|
"ant test-core" target can be used to run only the Core (non
|
|
|
|
contrib) tests.
|
|
|
|
(Chris Hostetter)
|
2007-06-01 13:44:49 -04:00
|
|
|
|
|
|
|
3. LUCENE-900: "ant test" now enables Java assertions (in Luecene packages).
|
|
|
|
(Doron Cohen)
|
2007-05-30 03:18:29 -04:00
|
|
|
|
2007-06-04 00:03:24 -04:00
|
|
|
4. LUCENE-894: Add custom build file for binary distributions that includes
|
|
|
|
targets to build the demos. (Chris Hostetter, Michael Busch)
|
|
|
|
|
2007-06-04 21:18:48 -04:00
|
|
|
5. LUCENE-904: The "package" targets in build.xml now also generate .md5
|
|
|
|
checksum files. (Chris Hostetter, Michael Busch)
|
|
|
|
|
2007-06-05 15:54:38 -04:00
|
|
|
6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
|
|
|
|
demo war, demo jar, and the contrib jars. (Michael Busch)
|
2007-06-05 19:52:05 -04:00
|
|
|
|
|
|
|
7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
|
2007-06-05 15:54:38 -04:00
|
|
|
|
2007-06-06 20:48:32 -04:00
|
|
|
9. LUCENE-908: Improves content of MANIFEST file and makes it customizable
|
|
|
|
for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
|
|
|
|
jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
|
|
|
|
(Chris Hostetter, Michael Busch)
|
|
|
|
|
2007-02-14 11:37:32 -05:00
|
|
|
======================= Release 2.1.0 2007-02-14 =======================
|
2006-05-26 13:40:18 -04:00
|
|
|
|
2006-08-05 09:11:09 -04:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
1. 's' and 't' have been removed from the list of default stopwords
|
|
|
|
in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
|
|
|
|
as a stopword meant that 's-class' led to the same results as 'class'.
|
|
|
|
Note that this problem still exists for 'a', e.g. in 'a-class' as
|
|
|
|
'a' continues to be a stopword.
|
|
|
|
(Daniel Naber)
|
2006-08-13 03:02:26 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
|
|
|
|
(now split into CJ and K) in StandardAnalyzer. (John Wang and
|
|
|
|
Steven Rowe via Otis Gospodnetic)
|
2006-08-13 03:47:34 -04:00
|
|
|
|
|
|
|
3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
|
|
|
|
and added a few more of them to increase CJK character coverage.
|
|
|
|
Also documented some of the ranges.
|
|
|
|
(Otis Gospodnetic)
|
2006-08-13 03:02:26 -04:00
|
|
|
|
2006-10-27 02:17:24 -04:00
|
|
|
4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
|
|
|
|
QueryParser. Default is to disallow them, as before.
|
|
|
|
(Steven Parkes via Otis Gospodnetic)
|
2006-12-17 11:45:53 -05:00
|
|
|
|
|
|
|
5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
|
|
|
|
for range queries. Added useOldRangeQuery property to QueryParser to allow
|
|
|
|
selection of old RangeQuery class if required.
|
2006-11-26 19:51:25 -05:00
|
|
|
(Mark Harwood)
|
2006-10-27 02:17:24 -04:00
|
|
|
|
2006-11-20 19:09:50 -05:00
|
|
|
6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
|
|
|
|
does not contain a wildcard character (? or *), when previously a
|
|
|
|
StringIndexOutOfBoundsException was thrown.
|
|
|
|
(Michael Busch via Erik Hatcher)
|
|
|
|
|
2006-11-26 18:40:18 -05:00
|
|
|
7. LUCENE-726: Removed the use of deprecated doc.fields() method and
|
|
|
|
Enumeration.
|
|
|
|
(Michael Busch via Otis Gospodnetic)
|
|
|
|
|
2006-12-20 14:45:40 -05:00
|
|
|
8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
|
|
|
|
and added a call to enumerators.remove() in TermInfosReader.close().
|
|
|
|
The finalize() overrides were added to help with a pre-1.4.2 JVM bug
|
|
|
|
that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
|
|
|
|
(Otis Gospodnetic)
|
|
|
|
|
2007-01-12 17:09:45 -05:00
|
|
|
9. LUCENE-771: The default location of the write lock is now the
|
|
|
|
index directory, and is named simply "write.lock" (without a big
|
|
|
|
digest prefix). The system properties "org.apache.lucene.lockDir"
|
|
|
|
nor "java.io.tmpdir" are no longer used as the global directory
|
|
|
|
for storing lock files, and the LOCK_DIR field of FSDirectory is
|
|
|
|
now deprecated. (Mike McCandless)
|
|
|
|
|
2006-05-26 13:40:18 -04:00
|
|
|
New features
|
|
|
|
|
2006-06-05 13:59:50 -04:00
|
|
|
1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
|
2006-07-29 05:54:48 -04:00
|
|
|
(Samphan Raruenrom via Chris Hostetter)
|
2006-05-26 13:40:18 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
2. LUCENE-545: New FieldSelector API and associated changes to
|
|
|
|
IndexReader and implementations. New Fieldable interface for use
|
|
|
|
with the lazy field loading mechanism. (Grant Ingersoll and Chuck
|
|
|
|
Williams via Grant Ingersoll)
|
2006-06-09 21:23:22 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
|
|
|
|
Smolsky, Yonik Seeley)
|
2006-10-05 17:10:30 -04:00
|
|
|
|
2006-10-16 14:28:02 -04:00
|
|
|
4. LUCENE-678: Added NativeFSLockFactory, which implements locking
|
2006-10-27 02:11:56 -04:00
|
|
|
using OS native locking (via java.nio.*). (Michael McCandless via
|
|
|
|
Yonik Seeley)
|
2006-10-16 14:28:02 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
5. LUCENE-544: Added the ability to specify different boosts for
|
|
|
|
different fields when using MultiFieldQueryParser (Matt Ericson
|
|
|
|
via Otis Gospodnetic)
|
2006-10-16 14:28:02 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
|
|
|
|
optimize the index when adding new segments, only performing
|
|
|
|
merges as needed. (Ning Li via Yonik Seeley)
|
2006-10-26 18:47:15 -04:00
|
|
|
|
2006-11-18 14:32:10 -05:00
|
|
|
7. LUCENE-573: QueryParser now allows backslash escaping in
|
|
|
|
quoted terms and phrases. (Michael Busch via Yonik Seeley)
|
|
|
|
|
2006-11-20 19:09:50 -05:00
|
|
|
8. LUCENE-716: QueryParser now allows specification of unicode
|
2006-11-18 20:34:10 -05:00
|
|
|
characters in terms via a unicode escape of the form \uXXXX
|
|
|
|
(Michael Busch via Yonik Seeley)
|
|
|
|
|
2006-11-21 21:47:49 -05:00
|
|
|
9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
|
|
|
|
and IndexWriter.flushRamSegments(), allowing applications to
|
|
|
|
control the amount of memory used to buffer documents.
|
|
|
|
(Chuck Williams via Yonik Seeley)
|
|
|
|
|
2006-11-28 13:29:28 -05:00
|
|
|
10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
|
|
|
|
(Yonik Seeley)
|
2006-11-21 21:47:49 -05:00
|
|
|
|
2006-12-20 17:32:13 -05:00
|
|
|
11. LUCENE-741: Command-line utility for modifying or removing norms
|
|
|
|
on fields in an existing index. This is mostly based on LUCENE-496
|
|
|
|
and lives in contrib/miscellaneous.
|
|
|
|
(Chris Hostetter, Otis Gospodnetic)
|
|
|
|
|
2006-12-22 18:43:17 -05:00
|
|
|
12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
|
|
|
|
their passing unit tests.
|
|
|
|
(Otis Gospodnetic)
|
|
|
|
|
2007-02-13 05:43:08 -05:00
|
|
|
13. LUCENE-565: Added methods to IndexWriter to more efficiently
|
|
|
|
handle updating documents (the "delete then add" use case). This
|
|
|
|
is intended to be an eventual replacement for the existing
|
|
|
|
IndexModifier. Added IndexWriter.flush() (renamed from
|
2007-02-01 05:55:12 -05:00
|
|
|
flushRamSegments()) to flush all pending updates (held in RAM), to
|
|
|
|
the Directory. (Ning Li via Mike McCandless)
|
|
|
|
|
2007-02-11 13:12:17 -05:00
|
|
|
14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
|
2007-05-19 07:15:12 -04:00
|
|
|
which allow one to retrieve the size of a field without retrieving the
|
|
|
|
actual field. (Chuck Williams via Grant Ingersoll)
|
2007-02-11 13:12:17 -05:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
15. LUCENE-799: Properly handle lazy, compressed fields.
|
|
|
|
(Mike Klaas via Grant Ingersoll)
|
2007-02-13 09:06:52 -05:00
|
|
|
|
2006-06-01 17:15:37 -04:00
|
|
|
API Changes
|
|
|
|
|
|
|
|
1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
|
|
|
|
changing of termText via setTermText(). (Yonik Seeley)
|
|
|
|
|
2006-06-09 18:15:47 -04:00
|
|
|
2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
|
|
|
|
and is supposed to be replaced with the WordlistLoader class in
|
|
|
|
package org.apache.lucene.analysis (Daniel Naber)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2006-06-22 14:37:32 -04:00
|
|
|
3. LUCENE-609: Revert return type of Document.getField(s) to Field
|
2006-06-21 15:49:27 -04:00
|
|
|
for backward compatibility, added new Document.getFieldable(s)
|
|
|
|
for access to new lazy loaded fields. (Yonik Seeley)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
|
|
|
4. LUCENE-608: Document.fields() has been deprecated and a new method
|
2006-06-22 14:37:32 -04:00
|
|
|
Document.getFields() has been added that returns a List instead of
|
|
|
|
an Enumeration (Daniel Naber)
|
2006-06-26 14:38:47 -04:00
|
|
|
|
|
|
|
5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
|
|
|
|
subclass allows explain methods to produce Explanations which model
|
|
|
|
"matching" independent of having a positive value.
|
|
|
|
(Chris Hostetter)
|
2006-06-30 17:46:29 -04:00
|
|
|
|
|
|
|
6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
|
|
|
|
and IndexWriter.setDefaultCommitLockTimeout for overriding default
|
|
|
|
timeout values for all future instances of IndexWriter (as well
|
|
|
|
as for any other classes that may refrence the static values,
|
|
|
|
ie: IndexReader).
|
|
|
|
(Michael McCandless via Chris Hostetter)
|
|
|
|
|
2006-08-22 09:38:16 -04:00
|
|
|
7. LUCENE-638: FSDirectory.list() now only returns the directory's
|
2006-07-29 05:54:48 -04:00
|
|
|
Lucene-related files. Thanks to this change one can now construct
|
|
|
|
a RAMDirectory from a file system directory that contains files
|
|
|
|
not related to Lucene.
|
|
|
|
(Simon Willnauer via Daniel Naber)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2006-08-28 21:13:10 -04:00
|
|
|
8. LUCENE-635: Decoupling locking implementation from Directory
|
|
|
|
implementation. Added set/getLockFactory to Directory and moved
|
|
|
|
all locking code into subclasses of abstract class LockFactory.
|
|
|
|
FSDirectory and RAMDirectory still default to their prior locking
|
|
|
|
implementations, but now you can mix & match, for example using
|
|
|
|
SingleInstanceLockFactory (ie, in memory locking) locking with an
|
|
|
|
FSDirectory. Note that now you must call setDisableLocks before
|
|
|
|
the instantiation a FSDirectory if you wish to disable locking
|
2006-10-10 10:57:25 -04:00
|
|
|
for that Directory.
|
2006-08-28 21:13:10 -04:00
|
|
|
(Michael McCandless, Jeff Patterson via Yonik Seeley)
|
2006-10-10 10:57:25 -04:00
|
|
|
|
2006-10-25 23:49:58 -04:00
|
|
|
9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
|
|
|
|
(Steven Parkes via Otis Gospodnetic)
|
|
|
|
|
2006-11-17 18:18:47 -05:00
|
|
|
10. LUCENE-701: Lockless commits: a commit lock is no longer required
|
|
|
|
when a writer commits and a reader opens the index. This includes
|
|
|
|
a change to the index file format (see docs/fileformats.html for
|
|
|
|
details). It also removes all APIs associated with the commit
|
|
|
|
lock & its timeout. Readers are now truly read-only and do not
|
|
|
|
block one another on startup. This is the first step to getting
|
|
|
|
Lucene to work correctly over NFS (second step is
|
|
|
|
LUCENE-710). (Mike McCandless)
|
|
|
|
|
2006-11-21 17:26:45 -05:00
|
|
|
11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
|
|
|
|
in Similarity's MoreLikeThis class. The misspelling has been
|
|
|
|
replaced by the correct spelling.
|
|
|
|
(Andi Vajda via Daniel Naber)
|
|
|
|
|
2006-12-10 21:03:38 -05:00
|
|
|
12. LUCENE-738: Reduce the size of the file that keeps track of which
|
|
|
|
documents are deleted when the number of deleted documents is
|
|
|
|
small. This changes the index file format and cannot be
|
|
|
|
read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
|
2006-11-21 17:26:45 -05:00
|
|
|
|
2007-01-06 23:19:21 -05:00
|
|
|
13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
|
|
|
|
number of open files and file descriptors for the non-compound index
|
|
|
|
format. This changes the index file format, but maintains the
|
|
|
|
ability to read and update older indicies. The first segment merge
|
|
|
|
on an older format index will create a single .nrm file for the new
|
|
|
|
segment. (Doron Cohen via Yonik Seeley)
|
|
|
|
|
2007-01-08 15:02:51 -05:00
|
|
|
14. LUCENE-732: DateTools support has been added to QueryParser, with
|
|
|
|
setters for both the default Resolution, and per-field Resolution.
|
|
|
|
For backwards compatibility, DateField is still used if no Resolutions
|
|
|
|
are specified. (Michael Busch via Chris Hostetter)
|
2007-01-11 05:53:25 -05:00
|
|
|
|
|
|
|
15. Added isOptimized() method to IndexReader.
|
|
|
|
(Otis Gospodnetic)
|
2007-01-18 17:47:03 -05:00
|
|
|
|
|
|
|
16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
|
|
|
|
take a boolean "create" argument. Instead you should use
|
|
|
|
IndexWriter's "create" argument to create a new index.
|
|
|
|
(Mike McCandless)
|
2007-01-22 14:21:56 -05:00
|
|
|
|
|
|
|
17. LUCENE-780: Add a static Directory.copy() method to copy files
|
|
|
|
from one Directory to another. (Jiri Kuhn via Mike McCandless)
|
|
|
|
|
2007-01-23 12:33:11 -05:00
|
|
|
18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
|
|
|
|
remove an old lock. The default implementation is to ask the
|
|
|
|
lockFactory (if non null) to clear the lock. (Mike McCandless)
|
|
|
|
|
2007-02-05 17:49:42 -05:00
|
|
|
19. LUCENE-795: Directory.renameFile() has been deprecated as it is
|
|
|
|
not used anymore inside Lucene. (Daniel Naber)
|
|
|
|
|
2006-05-26 13:40:18 -04:00
|
|
|
Bug fixes
|
|
|
|
|
2006-05-31 17:46:46 -04:00
|
|
|
1. Fixed the web application demo (built with "ant war-demo") which
|
2006-06-01 17:36:13 -04:00
|
|
|
didn't work because it used a QueryParser method that had
|
2006-05-31 17:46:46 -04:00
|
|
|
been removed (Daniel Naber)
|
2006-05-26 13:40:18 -04:00
|
|
|
|
2006-06-01 17:36:13 -04:00
|
|
|
2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
|
|
|
|
(Yonik Seeley)
|
|
|
|
|
|
|
|
3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
|
2006-06-01 17:37:14 -04:00
|
|
|
(Karl Wettin via Yonik Seeley)
|
2006-06-01 17:32:51 -04:00
|
|
|
|
2006-06-02 20:06:28 -04:00
|
|
|
4. LUCENE-587: Explanation.toHtml was producing malformed HTML
|
|
|
|
(Chris Hostetter)
|
2006-06-13 15:45:55 -04:00
|
|
|
|
|
|
|
5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
|
2006-06-16 01:20:49 -04:00
|
|
|
|
|
|
|
6. LUCENE-601: RAMDirectory and RAMFile made Serializable
|
|
|
|
(Karl Wettin via Otis Gospodnetic)
|
2006-06-19 21:13:13 -04:00
|
|
|
|
|
|
|
7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
|
|
|
|
Explanations match up with the real scores.
|
|
|
|
(Chris Hostetter)
|
2006-06-20 16:40:06 -04:00
|
|
|
|
|
|
|
8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
|
|
|
|
new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
|
2006-06-20 17:28:42 -04:00
|
|
|
|
2006-08-22 09:38:16 -04:00
|
|
|
9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
|
2006-06-22 13:48:25 -04:00
|
|
|
disambiguate inner class scorer's use of doc() in BooleanScorer2,
|
|
|
|
other test code changes. (DM Smith via Yonik Seeley)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2006-06-26 14:38:47 -04:00
|
|
|
10. LUCENE-451: All core query types now use ComplexExplanations so that
|
|
|
|
boosts of zero don't confuse the BooleanWeight explain method.
|
|
|
|
(Chris Hostetter)
|
2006-06-30 16:14:48 -04:00
|
|
|
|
2006-10-15 10:18:56 -04:00
|
|
|
11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
|
2006-06-30 16:14:48 -04:00
|
|
|
(Kåre Fiedler Christiansen via Otis Gospodnetic)
|
|
|
|
|
2006-07-30 07:56:57 -04:00
|
|
|
12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
|
|
|
|
(Daniel Naber)
|
|
|
|
|
2006-08-17 20:16:42 -04:00
|
|
|
13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
|
|
|
|
to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
|
|
|
|
|
2006-08-20 17:17:59 -04:00
|
|
|
14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
|
|
|
|
has no value.
|
|
|
|
(Oliver Hutchison via Chris Hostetter)
|
2006-10-09 13:26:49 -04:00
|
|
|
|
2006-10-15 18:37:52 -04:00
|
|
|
15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
|
|
|
|
(Yonik Seeley)
|
2006-10-10 10:57:25 -04:00
|
|
|
|
2006-10-19 17:03:22 -04:00
|
|
|
16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
|
|
|
|
lock to be shared between different directories.
|
|
|
|
(Michael McCandless via Yonik Seeley)
|
|
|
|
|
2006-10-19 23:08:02 -04:00
|
|
|
17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
|
|
|
|
(Yonik Seeley)
|
|
|
|
|
2006-10-25 00:20:34 -04:00
|
|
|
18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
|
|
|
|
called on it before next(). (Yonik Seeley)
|
|
|
|
|
2006-10-27 18:03:53 -04:00
|
|
|
19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
|
|
|
|
to recognize ordered spans if they overlaped with unordered spans.
|
|
|
|
(Paul Elschot via Chris Hostetter)
|
|
|
|
|
2006-11-20 02:10:04 -05:00
|
|
|
20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
|
|
|
|
in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
|
2006-11-05 21:20:39 -05:00
|
|
|
|
2006-11-17 17:34:17 -05:00
|
|
|
21. LUCENE-715: Fixed private constructor in IndexWriter.java to
|
|
|
|
properly release the acquired write lock if there is an
|
|
|
|
IOException after acquiring the write lock but before finishing
|
2006-11-20 02:10:04 -05:00
|
|
|
instantiation. (Matthew Bogosian via Mike McCandless)
|
|
|
|
|
2006-11-20 09:51:50 -05:00
|
|
|
22. LUCENE-651: Multiple different threads requesting the same
|
|
|
|
FieldCache entry (often for Sorting by a field) at the same
|
|
|
|
time caused multiple generations of that entry, which was
|
|
|
|
detrimental to performance and memory use.
|
2006-11-20 02:10:04 -05:00
|
|
|
(Oliver Hutchison via Otis Gospodnetic)
|
2006-11-17 17:34:17 -05:00
|
|
|
|
2006-11-28 15:33:00 -05:00
|
|
|
23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
|
|
|
|
(Doron Cohen via Otis Gospodnetic)
|
|
|
|
|
2006-11-28 15:46:42 -05:00
|
|
|
24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
|
|
|
|
classes from contrib/similarity, as their new home is under
|
|
|
|
contrib/queries.
|
2006-12-20 14:42:21 -05:00
|
|
|
(Otis Gospodnetic)
|
2006-11-28 15:46:42 -05:00
|
|
|
|
2006-11-29 19:07:46 -05:00
|
|
|
25. LUCENE-669: Do not double-close the RandomAccessFile in
|
|
|
|
FSIndexInput/Output during finalize(). Besides sending an
|
|
|
|
IOException up to the GC, this may also be the cause intermittent
|
|
|
|
"The handle is invalid" IOExceptions on Windows when trying to
|
2006-12-19 06:31:27 -05:00
|
|
|
close readers or writers. (Michael Busch via Mike McCandless)
|
2006-11-29 19:07:46 -05:00
|
|
|
|
2006-12-18 11:45:29 -05:00
|
|
|
26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
|
|
|
|
on any exceptions (eg disk full). The semantics of these methods
|
|
|
|
is now transactional: either all indices are merged or none are.
|
|
|
|
Also fixed IndexWriter.mergeSegments (called outside of
|
|
|
|
addIndexes(*) by addDocument, optimize, flushRamSegments) and
|
|
|
|
IndexReader.commit() (called by close) to clean up and keep the
|
|
|
|
instance state consistent to what's actually in the index (Mike
|
|
|
|
McCandless).
|
|
|
|
|
2006-12-19 06:31:27 -05:00
|
|
|
27. LUCENE-129: Change finalizers to do "try {...} finally
|
|
|
|
{super.finalize();}" to make sure we don't miss finalizers in
|
|
|
|
classes above us. (Esmond Pitt via Mike McCandless)
|
|
|
|
|
2006-12-19 22:47:09 -05:00
|
|
|
28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
|
|
|
|
IndexReaders to hang around forever, in addition to not
|
|
|
|
fixing the original FieldCache performance problem.
|
|
|
|
(Chris Hostetter, Yonik Seeley)
|
|
|
|
|
2007-01-08 13:11:08 -05:00
|
|
|
29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
|
|
|
|
correctly raise ArrayIndexOutOfBoundsException when docNum is too
|
|
|
|
large. Previously, if docNum was only slightly too large (within
|
|
|
|
the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
|
|
|
|
exception would be raised and instead the index would become
|
|
|
|
silently corrupted. The corruption then only appears much later,
|
|
|
|
in mergeSegments, when the corrupted segment is merged with
|
|
|
|
segment(s) after it. (Mike McCandless)
|
|
|
|
|
2007-01-09 12:13:57 -05:00
|
|
|
30. LUCENE-768: Fix case where an Exception during deleteDocument,
|
|
|
|
undeleteAll or setNorm in IndexReader could leave the reader in a
|
|
|
|
state where close() fails to release the write lock.
|
|
|
|
(Mike McCandless)
|
|
|
|
|
2007-01-09 14:03:29 -05:00
|
|
|
31. Remove "tvp" from known index file extensions because it is
|
|
|
|
never used. (Nicolas Lalevée via Bernhard Messer)
|
|
|
|
|
2007-01-10 11:06:33 -05:00
|
|
|
32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
|
|
|
|
rely on file length check and instead use the SegmentInfo's
|
|
|
|
docCount that's already stored explicitly in the index. This is a
|
|
|
|
defensive bug fix (ie, there is no known problem seen "in real
|
|
|
|
life" due to this, just a possible future problem). (Chuck
|
|
|
|
Williams via Mike McCandless)
|
|
|
|
|
2006-06-01 16:33:18 -04:00
|
|
|
Optimizations
|
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
1. LUCENE-586: TermDocs.skipTo() is now more efficient for
|
|
|
|
multi-segment indexes. This will improve the performance of many
|
|
|
|
types of queries against a non-optimized index. (Andrew Hudson
|
|
|
|
via Yonik Seeley)
|
2006-06-01 16:33:18 -04:00
|
|
|
|
2006-07-29 05:54:48 -04:00
|
|
|
2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
|
2006-07-06 18:14:07 -04:00
|
|
|
internal "files", allowing them to be GCed even if references to the
|
|
|
|
RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
|
2006-05-26 13:40:18 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
3. LUCENE-629: Compressed fields are no longer uncompressed and
|
|
|
|
recompressed during segment merges (e.g. during indexing or
|
|
|
|
optimizing), thus improving performance . (Michael Busch via Otis
|
|
|
|
Gospodnetic)
|
2006-08-13 02:12:07 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
|
|
|
|
large by keeping a count of buffered documents rather than
|
|
|
|
counting after each document addition. (Doron Cohen, Paul Smith,
|
|
|
|
Yonik Seeley)
|
2006-08-16 22:52:21 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
|
|
|
|
looping through docs. (Grant Ingersoll)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
6. LUCENE-672: New indexing segment merge policy flushes all
|
|
|
|
buffered docs to their own segment and delays a merge until
|
|
|
|
mergeFactor segments of a certain level have been accumulated.
|
|
|
|
This increases indexing performance in the presence of deleted
|
|
|
|
docs or partially full segments as well as enabling future
|
2006-11-28 17:20:24 -05:00
|
|
|
optimizations.
|
|
|
|
|
|
|
|
NOTE: this also fixes an "under-merging" bug whereby it is
|
|
|
|
possible to get far too many segments in your index (which will
|
|
|
|
drastically slow down search, risks exhausting file descriptor
|
|
|
|
limit, etc.). This can happen when the number of buffered docs
|
|
|
|
at close, plus the number of docs in the last non-ram segment is
|
|
|
|
greater than mergeFactor. (Ning Li, Yonik Seeley)
|
2006-09-14 14:31:21 -04:00
|
|
|
|
2006-10-09 13:26:49 -04:00
|
|
|
7. Lazy loaded fields unnecessarily retained an extra copy of loaded
|
|
|
|
String data. (Yonik Seeley)
|
|
|
|
|
2006-10-17 16:50:52 -04:00
|
|
|
8. LUCENE-443: ConjunctionScorer performance increase. Speed up
|
|
|
|
any BooleanQuery with more than one mandatory clause.
|
|
|
|
(Abdul Chaudhry, Paul Elschot via Yonik Seeley)
|
2006-10-17 20:56:08 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
9. LUCENE-365: DisjunctionSumScorer performance increase of
|
|
|
|
~30%. Speeds up queries with optional clauses. (Paul Elschot via
|
|
|
|
Yonik Seeley)
|
2006-10-17 16:50:52 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
|
|
|
|
size buffers, which will speed up merging and retrieving binary
|
|
|
|
and compressed fields. (Nadav Har'El via Yonik Seeley)
|
2006-10-26 18:25:44 -04:00
|
|
|
|
2006-10-30 17:00:31 -05:00
|
|
|
11. LUCENE-687: Lazy skipping on proximity file speeds up most
|
|
|
|
queries involving term positions, including phrase queries.
|
|
|
|
(Michael Busch via Yonik Seeley)
|
|
|
|
|
2006-11-17 06:18:11 -05:00
|
|
|
12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
|
|
|
|
with calls to System.arraycopy instead, in DocumentWriter.java.
|
|
|
|
(Nicolas Lalevee via Mike McCandless)
|
2006-10-30 17:00:31 -05:00
|
|
|
|
2006-11-28 13:17:56 -05:00
|
|
|
13. LUCENE-729: Non-recursive skipTo and next implementation of
|
|
|
|
TermDocs for a MultiReader. The old implementation could
|
|
|
|
recurse up to the number of segments in the index. (Yonik Seeley)
|
|
|
|
|
2006-12-10 21:38:29 -05:00
|
|
|
14. LUCENE-739: Improve segment merging performance by reusing
|
|
|
|
the norm array across different fields and doing bulk writes
|
|
|
|
of norms of segments with no deleted docs.
|
|
|
|
(Michael Busch via Yonik Seeley)
|
|
|
|
|
2006-12-16 21:40:37 -05:00
|
|
|
15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
|
|
|
|
to the List of clauses and replaced the internal synchronized Vector
|
|
|
|
with an unsynchronized List. (Yonik Seeley)
|
|
|
|
|
2006-12-19 11:31:06 -05:00
|
|
|
16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
|
|
|
|
FSIndexInput finalizer to the actual file so all clones don't
|
|
|
|
register a new finalizer. (Yonik Seeley)
|
|
|
|
|
2006-08-22 09:38:16 -04:00
|
|
|
Test Cases
|
2007-05-19 07:15:12 -04:00
|
|
|
|
2006-08-22 09:38:16 -04:00
|
|
|
1. Added TestTermScorer.java (Grant Ingersoll)
|
2007-05-19 07:15:12 -04:00
|
|
|
|
2006-11-28 14:27:58 -05:00
|
|
|
2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
|
2007-05-19 07:15:12 -04:00
|
|
|
|
|
|
|
3. LUCENE-744 Append the user.name property onto the temporary directory
|
|
|
|
that is created so it doesn't interfere with other users. (Grant Ingersoll)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
|
|
|
Documentation
|
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
1. Added style sheet to xdocs named lucene.css and included in the
|
|
|
|
Anakia VSL descriptor. (Grant Ingersoll)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2006-10-27 02:11:56 -04:00
|
|
|
2. Added scoring.xml document into xdocs. Updated Similarity.java
|
|
|
|
scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
|
|
|
|
Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
|
|
|
|
Issue 664.
|
2006-10-10 10:57:25 -04:00
|
|
|
|
2006-10-10 11:02:29 -04:00
|
|
|
3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
|
|
|
|
Issue 707. Site now builds using Forrest, just like the other Lucene
|
|
|
|
siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
|
|
|
|
for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
|
|
|
|
Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
|
2006-11-26 19:18:36 -05:00
|
|
|
|
2006-12-09 11:32:22 -05:00
|
|
|
5. Added in Developer and System Requriements sections under Resources (Grant Ingersoll)
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
6. LUCENE-713 Updated the Term Vector section of File Formats to include
|
|
|
|
documentation on how Offset and Position info are stored in the TVF file.
|
|
|
|
(Grant Ingersoll, Samir Abdou)
|
2006-12-16 15:23:30 -05:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
7. Added in link to Clover Test Code Coverage Reports under the Develop
|
|
|
|
section in Resources (Grant Ingersoll)
|
2006-12-16 15:23:30 -05:00
|
|
|
|
2006-12-19 06:31:27 -05:00
|
|
|
8. LUCENE-748: Added details for semantics of IndexWriter.close on
|
|
|
|
hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
9. Added some text about what is contained in releases.
|
|
|
|
(Eric Haszlakiewicz via Grant Ingersoll)
|
2006-12-29 10:19:14 -05:00
|
|
|
|
2007-01-01 09:06:26 -05:00
|
|
|
10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
|
|
|
|
makes a full copy of the starting Directory. (Mike McCandless)
|
|
|
|
|
2007-01-03 15:59:01 -05:00
|
|
|
11. LUCENE-764: Fix javadocs to detail temporary space requirements
|
|
|
|
for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
|
|
|
|
methods. (Mike McCandless)
|
|
|
|
|
2006-12-09 11:32:22 -05:00
|
|
|
Build
|
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
|
|
|
|
To enable clover code coverage, you must have clover.jar in the ANT
|
|
|
|
classpath and specify -Drun.clover=true on the command line.
|
|
|
|
(Michael Busch and Grant Ingersoll)
|
2006-12-09 11:32:22 -05:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
|
|
|
|
${build.dir}/test just like the tempDir sysproperty.
|
2006-12-17 11:45:53 -05:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
3. LUCENE-757 Added new target named init-dist that does setup for
|
|
|
|
distribution of both binary and source distributions. Called by package
|
|
|
|
and package-*-src
|
2006-12-20 22:14:02 -05:00
|
|
|
|
2007-05-19 07:15:12 -04:00
|
|
|
======================= Release 2.0.0 2006-05-26 =======================
|
2006-03-07 19:59:28 -05:00
|
|
|
|
2006-04-05 16:37:46 -04:00
|
|
|
API Changes
|
|
|
|
|
|
|
|
1. All deprecated methods and fields have been removed, except
|
|
|
|
DateField, which will still be supported for some time
|
2006-04-05 21:40:42 -04:00
|
|
|
so Lucene can read its date fields from old indexes
|
|
|
|
(Yonik Seeley & Grant Ingersoll)
|
|
|
|
|
2006-05-12 14:29:51 -04:00
|
|
|
2. DisjunctionSumScorer is no longer public.
|
|
|
|
(Paul Elschot via Otis Gospodnetic)
|
2006-08-22 09:38:16 -04:00
|
|
|
|
|
|
|
3. Creating a Field with both an empty name and an empty value
|
2006-05-13 19:50:35 -04:00
|
|
|
now throws an IllegalArgumentException
|
|
|
|
(Daniel Naber)
|
2006-05-12 14:29:51 -04:00
|
|
|
|
2006-12-19 06:31:27 -05:00
|
|
|
4. LUCENE-301: Added new IndexWriter({String,File,Directory},
|
|
|
|
Analyzer) constructors that do not take a boolean "create"
|
|
|
|
argument. These new constructors will create a new index if
|
|
|
|
necessary, else append to the existing one. (Dan Armbrust via
|
|
|
|
Mike McCandless)
|
|
|
|
|
2006-05-25 14:49:04 -04:00
|
|
|
New features
|
|
|
|
|
|
|
|
1. LUCENE-496: Command line tool for modifying the field norms of an
|
|
|
|
existing index; added to contrib/miscellaneous. (Chris Hostetter)
|
|
|
|
|
2006-05-25 17:21:29 -04:00
|
|
|
2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
|
|
|
|
(Chris Hostetter)
|
|
|
|
|
2006-03-07 19:59:28 -05:00
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. LUCENE-330: Fix issue of FilteredQuery not working properly within
|
|
|
|
BooleanQuery. (Paul Elschot via Erik Hatcher)
|
|
|
|
|
2006-03-08 21:42:13 -05:00
|
|
|
2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
|
|
|
|
with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
|
|
|
|
|
2006-03-17 17:51:32 -05:00
|
|
|
3. Added methods to get/set writeLockTimeout and commitLockTimeout in
|
|
|
|
IndexWriter. These could be set in Lucene 1.4 using a system property.
|
2006-04-05 21:40:42 -04:00
|
|
|
This feature had been removed without adding the corresponding
|
2006-03-17 17:51:32 -05:00
|
|
|
getter/setter methods. (Daniel Naber)
|
2006-04-04 11:25:58 -04:00
|
|
|
|
|
|
|
4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
|
|
|
|
when using SpanQueries. (Paul Elschot via Yonik Seeley)
|
2006-04-04 12:18:51 -04:00
|
|
|
|
|
|
|
5. Implemented FilterIndexReader.getVersion() and isCurrent()
|
|
|
|
(Yonik Seeley)
|
|
|
|
|
2006-04-05 21:40:42 -04:00
|
|
|
6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
|
|
|
|
that sometimes caused the index order of documents to change.
|
|
|
|
(Yonik Seeley)
|
|
|
|
|
2006-04-06 00:02:09 -04:00
|
|
|
7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
|
|
|
|
subsequent String sorts with different locales to sort identically.
|
|
|
|
(Paul Cowan via Yonik Seeley)
|
2006-04-04 12:18:51 -04:00
|
|
|
|
2006-04-06 00:16:36 -04:00
|
|
|
8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
|
|
|
|
(Stefan Will via Yonik Seeley)
|
|
|
|
|
2006-04-06 13:12:44 -04:00
|
|
|
9. LUCENE-514: Added getTermArrays() and extractTerms() to
|
|
|
|
MultiPhraseQuery (Eric Jain & Yonik Seeley)
|
|
|
|
|
2006-04-06 13:25:02 -04:00
|
|
|
10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
|
|
|
|
(frederic via Yonik)
|
2006-04-06 13:12:44 -04:00
|
|
|
|
2006-05-18 03:39:23 -04:00
|
|
|
11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
|
|
|
|
NullPointerException when "exclude" query was not a SpanTermQuery.
|
2006-05-18 03:56:37 -04:00
|
|
|
(Chris Hostetter)
|
2006-05-18 03:39:23 -04:00
|
|
|
|
2006-05-19 12:39:42 -04:00
|
|
|
12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
|
2006-05-18 03:56:37 -04:00
|
|
|
(Chris Hostetter)
|
2006-05-19 12:39:42 -04:00
|
|
|
|
|
|
|
13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
|
|
|
|
didn't know about the field yet, reader didn't keep track if it had deletions,
|
|
|
|
and deleteDocument calls could circumvent synchronization on the subreaders.
|
|
|
|
(Chuck Williams via Yonik Seeley)
|
2006-05-19 16:29:09 -04:00
|
|
|
|
|
|
|
14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
|
|
|
|
ConstantScoreQuery in order to allow their use with a MultiSearcher.
|
|
|
|
(Yonik Seeley)
|
2006-05-20 21:01:42 -04:00
|
|
|
|
2006-05-23 11:01:11 -04:00
|
|
|
15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
|
|
|
|
(Peter Royal, Michael Chan, Yonik Seeley)
|
|
|
|
|
2006-05-26 12:14:12 -04:00
|
|
|
16. LUCENE-485: Don't hold commit lock while removing obsolete index
|
|
|
|
files. (Luc Vanlerberghe via cutting)
|
|
|
|
|
|
|
|
|
2006-03-03 13:20:51 -05:00
|
|
|
1.9.1
|
|
|
|
|
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
|
2006-03-03 23:07:50 -05:00
|
|
|
introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
|
2006-03-03 13:20:51 -05:00
|
|
|
|
2006-02-28 11:49:44 -05:00
|
|
|
1.9 final
|
|
|
|
|
2006-08-13 02:06:23 -04:00
|
|
|
Note that this release is mostly but not 100% source compatible with
|
2006-02-28 11:49:44 -05:00
|
|
|
the previous release of Lucene (1.4.3). In other words, you should
|
|
|
|
make sure your application compiles with this version of Lucene before
|
|
|
|
you replace the old Lucene JAR with the new one. Many methods have
|
|
|
|
been deprecated in anticipation of release 2.0, so deprecation
|
|
|
|
warnings are to be expected when upgrading from 1.4.3 to 1.9.
|
2006-02-21 12:00:40 -05:00
|
|
|
|
2006-02-26 09:40:05 -05:00
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
|
|
|
|
effects on indexing performance and has thus been reverted. The
|
|
|
|
argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
|
2006-03-04 08:22:45 -05:00
|
|
|
an exception is thrown. (Daniel Naber)
|
2006-02-26 09:40:05 -05:00
|
|
|
|
2006-02-21 12:00:40 -05:00
|
|
|
Optimizations
|
|
|
|
|
|
|
|
1. Optimized BufferedIndexOutput.writeBytes() to use
|
|
|
|
System.arraycopy() in more cases, rather than copying byte-by-byte.
|
|
|
|
(Lukas Zapletal via Cutting)
|
|
|
|
|
2004-08-07 07:36:39 -04:00
|
|
|
1.9 RC1
|
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
Requirements
|
|
|
|
|
|
|
|
1. To compile and use Lucene you now need Java 1.4 or later.
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2004-12-12 06:46:02 -05:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
|
|
|
|
FuzzyQuery expands to more than BooleanQuery.maxClauseCount
|
|
|
|
terms only the BooleanQuery.maxClauseCount most similar terms
|
2004-12-12 06:46:02 -05:00
|
|
|
go into the rewritten query and thus the exception is avoided.
|
|
|
|
(Christoph)
|
|
|
|
|
2005-03-07 14:26:27 -05:00
|
|
|
2. Changed system property from "org.apache.lucene.lockdir" to
|
2004-12-12 06:46:02 -05:00
|
|
|
"org.apache.lucene.lockDir", so that its casing follows the existing
|
2005-11-11 20:08:01 -05:00
|
|
|
pattern used in other Lucene system properties. (Bernhard)
|
2004-12-12 06:46:02 -05:00
|
|
|
|
2004-12-14 18:13:34 -05:00
|
|
|
3. The terms of RangeQueries and FuzzyQueries are now converted to
|
|
|
|
lowercase by default (as it has been the case for PrefixQueries
|
|
|
|
and WildcardQueries before). Use setLowercaseExpandedTerms(false)
|
|
|
|
to disable that behavior but note that this also affects
|
|
|
|
PrefixQueries and WildcardQueries. (Daniel Naber)
|
2005-04-19 23:28:32 -04:00
|
|
|
|
|
|
|
4. Document frequency that is computed when MultiSearcher is used is now
|
|
|
|
computed correctly and "globally" across subsearchers and indices, while
|
|
|
|
before it used to be computed locally to each index, which caused
|
|
|
|
ranking across multiple indices not to be equivalent.
|
2005-04-20 16:39:18 -04:00
|
|
|
(Chuck Williams, Wolf Siberski via Otis, bug #31841)
|
2005-04-19 23:28:32 -04:00
|
|
|
|
2005-05-04 19:34:52 -04:00
|
|
|
5. When opening an IndexWriter with create=true, Lucene now only deletes
|
|
|
|
its own files from the index directory (looking at the file name suffixes
|
|
|
|
to decide if a file belongs to Lucene). The old behavior was to delete
|
2005-06-10 03:30:15 -04:00
|
|
|
all files. (Daniel Naber and Bernhard Messer, bug #34695)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-06-07 18:13:44 -04:00
|
|
|
6. The version of an IndexReader, as returned by getCurrentVersion()
|
|
|
|
and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
|
|
|
|
is now initialized by the system time in milliseconds.
|
|
|
|
(Bernhard Messer via Daniel Naber)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-07-13 16:55:53 -04:00
|
|
|
7. Several default values cannot be set via system properties anymore, as
|
|
|
|
this has been considered inappropriate for a library like Lucene. For
|
|
|
|
most properties there are set/get methods available in IndexWriter which
|
2005-07-15 17:03:33 -04:00
|
|
|
you should use instead. This affects the following properties:
|
|
|
|
See IndexWriter for getter/setter methods:
|
|
|
|
org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
|
|
|
|
org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
|
|
|
|
org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
|
|
|
|
org.apache.lucene.mergeFactor,
|
|
|
|
See BooleanQuery for getter/setter methods:
|
|
|
|
org.apache.lucene.maxClauseCount
|
2005-08-09 16:12:50 -04:00
|
|
|
See FSDirectory for getter/setter methods:
|
|
|
|
disableLuceneLocks
|
2005-07-13 16:55:53 -04:00
|
|
|
(Daniel Naber)
|
2005-11-09 01:44:10 -05:00
|
|
|
|
2005-11-09 13:50:21 -05:00
|
|
|
8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
|
|
|
|
instead of using Integer and Float classes for parsing.
|
2005-11-09 01:44:10 -05:00
|
|
|
(Yonik Seeley via Otis Gospodnetic)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-12-07 12:48:37 -05:00
|
|
|
9. Expert level search routines returning TopDocs and TopFieldDocs
|
|
|
|
no longer normalize scores. This also fixes bugs related to
|
|
|
|
MultiSearchers and score sorting/normalization.
|
|
|
|
(Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
New features
|
|
|
|
|
|
|
|
1. Added support for stored compressed fields (patch #31149)
|
|
|
|
(Bernhard Messer via Christoph)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
2. Added support for binary stored fields (patch #29370)
|
|
|
|
(Drew Farris and Bernhard Messer via Christoph)
|
2004-08-13 15:33:25 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
3. Added support for position and offset information in term vectors
|
|
|
|
(patch #18927). (Grant Ingersoll & Christoph)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
4. A new class DateTools has been added. It allows you to format dates
|
2004-09-05 18:09:26 -04:00
|
|
|
in a readable format adequate for indexing. Unlike the existing
|
|
|
|
DateField class DateTools can cope with dates before 1970 and it
|
|
|
|
forces you to specify the desired date resolution (e.g. month, day,
|
|
|
|
second, ...) which can make RangeQuerys on those fields more efficient.
|
|
|
|
(Daniel Naber)
|
2004-11-19 15:39:02 -05:00
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
5. QueryParser now correctly works with Analyzers that can return more
|
2004-11-19 15:39:02 -05:00
|
|
|
than one token per position. For example, a query "+fast +car"
|
|
|
|
would be parsed as "+fast +(car automobile)" if the Analyzer
|
2005-11-11 20:08:01 -05:00
|
|
|
returns "car" and "automobile" at the same position whenever it
|
2004-11-19 15:39:02 -05:00
|
|
|
finds "car" (Patch #23307).
|
|
|
|
(Pierrick Brihaye, Daniel Naber)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
6. Permit unbuffered Directory implementations (e.g., using mmap).
|
2004-09-16 17:13:37 -04:00
|
|
|
InputStream is replaced by the new classes IndexInput and
|
2004-09-28 14:15:52 -04:00
|
|
|
BufferedIndexInput. OutputStream is replaced by the new classes
|
|
|
|
IndexOutput and BufferedIndexOutput. InputStream and OutputStream
|
|
|
|
are now deprecated and FSDirectory is now subclassable. (cutting)
|
2004-08-17 16:38:46 -04:00
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
7. Add native Directory and TermDocs implementations that work under
|
2004-09-29 12:54:44 -04:00
|
|
|
GCJ. These require GCC 3.4.0 or later and have only been tested
|
|
|
|
on Linux. Use 'ant gcj' to build demo applications. (cutting)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
8. Add MMapDirectory, which uses nio to mmap input files. This is
|
2004-09-29 12:54:44 -04:00
|
|
|
still somewhat slower than FSDirectory. However it uses less
|
|
|
|
memory per query term, since a new buffer is not allocated per
|
|
|
|
term, which may help applications which use, e.g., wildcard
|
2004-10-04 15:45:27 -04:00
|
|
|
queries. It may also someday be faster. (cutting & Paul Elschot)
|
2004-09-29 12:54:44 -04:00
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
9. Added javadocs-internal to build.xml - bug #30360
|
2004-11-19 15:39:02 -05:00
|
|
|
(Paul Elschot via Otis)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2004-11-23 16:07:49 -05:00
|
|
|
10. Added RangeFilter, a more generically useful filter than DateFilter.
|
|
|
|
(Chris M Hostetter via Erik)
|
|
|
|
|
|
|
|
11. Added NumberTools, a utility class indexing numeric fields.
|
|
|
|
(adapted from code contributed by Matt Quail; committed by Erik)
|
2004-12-30 08:04:13 -05:00
|
|
|
|
2005-02-04 12:46:21 -05:00
|
|
|
12. Added public static IndexReader.main(String[] args) method.
|
|
|
|
IndexReader can now be used directly at command line level
|
2005-01-21 13:28:35 -05:00
|
|
|
to list and optionally extract the individual files from an existing
|
2004-12-30 08:04:13 -05:00
|
|
|
compound index file.
|
|
|
|
(adapted from code contributed by Garrett Rooney; committed by Bernhard)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-03-09 13:58:26 -05:00
|
|
|
13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
|
|
|
|
(Doug Cutting)
|
|
|
|
|
2005-03-12 05:38:51 -05:00
|
|
|
14. Added LucenePackage, whose static get() method returns java.util.Package,
|
2005-03-11 22:23:13 -05:00
|
|
|
which lets the caller get the Lucene version information specified in
|
|
|
|
the Lucene Jar.
|
|
|
|
(Doug Cutting via Otis)
|
2005-04-25 20:21:53 -04:00
|
|
|
|
|
|
|
15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
|
|
|
|
This provides standard java.util.Iterator iteration over Hits.
|
|
|
|
Each call to the iterator's next() method returns a Hit object.
|
|
|
|
(Jeremy Rayner via Erik)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-05-12 13:59:41 -04:00
|
|
|
16. Add ParallelReader, an IndexReader that combines separate indexes
|
|
|
|
over different fields into a single virtual index. (Doug Cutting)
|
|
|
|
|
2005-06-02 12:48:40 -04:00
|
|
|
17. Add IntParser and FloatParser interfaces to FieldCache, so that
|
|
|
|
fields in arbitrarily formats can be cached as ints and floats.
|
|
|
|
(Doug Cutting)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-06-06 18:29:30 -04:00
|
|
|
18. Added class org.apache.lucene.index.IndexModifier which combines
|
|
|
|
IndexWriter and IndexReader, so you can add and delete documents without
|
|
|
|
worrying about synchronisation/locking issues.
|
|
|
|
(Daniel Naber)
|
2005-06-02 12:48:40 -04:00
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
19. Lucene can now be used inside an unsigned applet, as Lucene's access
|
2005-08-09 16:17:30 -04:00
|
|
|
to system properties will not cause a SecurityException anymore.
|
|
|
|
(Jon Schuster via Daniel Naber, bug #34359)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-08-21 08:23:00 -04:00
|
|
|
20. Added a new class MatchAllDocsQuery that matches all documents.
|
|
|
|
(John Wang via Daniel Naber, bug #34946)
|
2005-09-16 10:17:32 -04:00
|
|
|
|
2005-10-28 23:52:48 -04:00
|
|
|
21. Added ability to omit norms on a per field basis to decrease
|
|
|
|
index size and memory consumption when there are many indexed fields.
|
|
|
|
See Field.setOmitNorms()
|
|
|
|
(Yonik Seeley, LUCENE-448)
|
2005-09-16 10:17:32 -04:00
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
22. Added NullFragmenter to contrib/highlighter, which is useful for
|
|
|
|
highlighting entire documents or fields.
|
|
|
|
(Erik Hatcher)
|
|
|
|
|
2005-11-12 04:03:26 -05:00
|
|
|
23. Added regular expression queries, RegexQuery and SpanRegexQuery.
|
|
|
|
Note the same term enumeration caveats apply with these queries as
|
|
|
|
apply to WildcardQuery and other term expanding queries.
|
2005-11-12 19:11:23 -05:00
|
|
|
These two new queries are not currently supported via QueryParser.
|
2005-11-12 04:03:26 -05:00
|
|
|
(Erik Hatcher)
|
|
|
|
|
2005-11-15 00:28:52 -05:00
|
|
|
24. Added ConstantScoreQuery which wraps a filter and produces a score
|
|
|
|
equal to the query boost for every matching document.
|
|
|
|
(Yonik Seeley, LUCENE-383)
|
|
|
|
|
|
|
|
25. Added ConstantScoreRangeQuery which produces a constant score for
|
|
|
|
every document in the range. One advantage over a normal RangeQuery
|
|
|
|
is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
|
|
|
|
number of terms the range can cover. Both endpoints may also be open.
|
|
|
|
(Yonik Seeley, LUCENE-383)
|
|
|
|
|
2005-11-16 11:39:59 -05:00
|
|
|
26. Added ability to specify a minimum number of optional clauses that
|
|
|
|
must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
|
|
|
|
(Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
|
|
|
|
|
2006-07-29 05:54:48 -04:00
|
|
|
27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
|
2005-11-16 13:47:37 -05:00
|
|
|
It's very useful for searching across multiple fields.
|
|
|
|
(Chuck Williams via Yonik Seeley, LUCENE-323)
|
|
|
|
|
2005-12-04 18:07:42 -05:00
|
|
|
28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
|
2005-11-21 17:03:20 -05:00
|
|
|
Latin 1 character set by their unaccented equivalent.
|
|
|
|
(Sven Duzont via Erik Hatcher)
|
|
|
|
|
|
|
|
29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
|
|
|
|
This is useful for data like zip codes, ids, and some product names.
|
|
|
|
(Erik Hatcher)
|
|
|
|
|
2005-12-04 18:07:42 -05:00
|
|
|
30. Copied LengthFilter from contrib area to core. Removes words that are too
|
2005-11-21 17:03:20 -05:00
|
|
|
long and too short from the stream.
|
|
|
|
(David Spencer via Otis and Daniel)
|
|
|
|
|
2005-11-21 20:46:46 -05:00
|
|
|
31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
|
|
|
|
custom analyzers to put gaps between Field instances with the same field
|
|
|
|
name, preventing phrase or span queries crossing these boundaries. The
|
|
|
|
default implementation issues a gap of 0, allowing the default token
|
|
|
|
position increment of 1 to put the next field's first token into a
|
|
|
|
successive position.
|
|
|
|
(Erik Hatcher, with advice from Yonik)
|
|
|
|
|
2005-12-05 20:36:53 -05:00
|
|
|
32. StopFilter can now ignore case when checking for stop words.
|
|
|
|
(Grant Ingersoll via Yonik, LUCENE-248)
|
|
|
|
|
2006-01-02 17:00:07 -05:00
|
|
|
33. Add TopDocCollector and TopFieldDocCollector. These simplify the
|
|
|
|
implementation of hit collectors that collect only the
|
|
|
|
top-scoring or top-sorting hits.
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
API Changes
|
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
1. Several methods and fields have been deprecated. The API documentation
|
2004-11-19 15:39:02 -05:00
|
|
|
contains information about the recommended replacements. It is planned
|
2005-11-11 20:08:01 -05:00
|
|
|
that most of the deprecated methods and fields will be removed in
|
2004-11-28 10:42:17 -05:00
|
|
|
Lucene 2.0. (Daniel Naber)
|
2004-11-19 15:39:02 -05:00
|
|
|
|
2005-04-27 06:26:22 -04:00
|
|
|
2. The Russian and the German analyzers have been moved to contrib/analyzers.
|
2004-11-19 15:39:02 -05:00
|
|
|
Also, the WordlistLoader class has been moved one level up in the
|
|
|
|
hierarchy and is now org.apache.lucene.analysis.WordlistLoader
|
|
|
|
(Daniel Naber)
|
|
|
|
|
|
|
|
3. The API contained methods that declared to throw an IOException
|
2005-11-11 20:08:01 -05:00
|
|
|
but that never did this. These declarations have been removed. If
|
2004-11-19 15:39:02 -05:00
|
|
|
your code tries to catch these exceptions you might need to remove
|
|
|
|
those catch clauses to avoid compile errors. (Daniel Naber)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
4. Add a serializable Parameter Class to standardize parameter enum
|
|
|
|
classes in BooleanClause and Field. (Christoph)
|
|
|
|
|
2005-09-16 10:17:32 -04:00
|
|
|
5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
|
|
|
|
This allows custom SpanQuery subclasses that rewrite (for term expansion, for
|
|
|
|
example) to nest within the built-in SpanQuery classes successfully.
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
Bug fixes
|
|
|
|
|
2005-11-11 20:08:01 -05:00
|
|
|
1. The JSP demo page (src/jsp/results.jsp) now properly closes the
|
2004-11-19 16:04:17 -05:00
|
|
|
IndexSearcher it opens. (Daniel Naber)
|
|
|
|
|
2004-12-06 15:04:01 -05:00
|
|
|
2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
|
2004-11-19 15:39:02 -05:00
|
|
|
prevented deletion of obsolete segments. (Christoph Goller)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2004-12-19 10:23:32 -05:00
|
|
|
3. Fix in FieldInfos to avoid the return of an extra blank field in
|
|
|
|
IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-01-20 16:20:55 -05:00
|
|
|
4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
|
|
|
|
PhrasePrefixQuery) could provoke UnsupportedOperationException
|
|
|
|
(bug #33161). (Rhett Sutphin via Daniel Naber)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
|
|
|
5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
|
2005-01-24 15:31:56 -05:00
|
|
|
if skipTo() was called without prior call to next() fixed. (Christoph)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-03-07 14:26:27 -05:00
|
|
|
6. Disable Similiarty.coord() in the scoring of most automatically
|
|
|
|
generated boolean queries. The coord() score factor is
|
|
|
|
appropriate when clauses are independently specified by a user,
|
|
|
|
but is usually not appropriate when clauses are generated
|
|
|
|
automatically, e.g., by a fuzzy, wildcard or range query. Matches
|
|
|
|
on such automatically generated queries are no longer penalized
|
|
|
|
for not matching all terms. (Doug Cutting, Patch #33472)
|
|
|
|
|
2005-06-01 16:10:58 -04:00
|
|
|
7. Getting a lock file with Lock.obtain(long) was supposed to wait for
|
|
|
|
a given amount of milliseconds, but this didn't work.
|
|
|
|
(John Wang via Daniel Naber, Bug #33799)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-06-02 12:57:10 -04:00
|
|
|
8. Fix FSDirectory.createOutput() to always create new files.
|
|
|
|
Previously, existing files were overwritten, and an index could be
|
|
|
|
corrupted when the old version of a file was longer than the new.
|
|
|
|
Now any existing file is first removed. (Doug Cutting)
|
|
|
|
|
2005-06-02 20:22:47 -04:00
|
|
|
9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
|
|
|
|
could return an incorrect number of hits.
|
|
|
|
(Reece Wilton via Erik Hatcher, Bug #35157)
|
|
|
|
|
2005-07-17 06:57:57 -04:00
|
|
|
10. Fix NullPointerException that could occur with a MultiPhraseQuery
|
|
|
|
inside a BooleanQuery.
|
|
|
|
(Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
|
|
|
|
|
2005-09-22 09:38:58 -04:00
|
|
|
11. Fixed SnowballFilter to pass through the position increment from
|
|
|
|
the original token.
|
|
|
|
(Yonik Seeley via Erik Hatcher, LUCENE-437)
|
|
|
|
|
2005-11-12 03:33:21 -05:00
|
|
|
12. Added Unicode range of Korean characters to StandardTokenizer,
|
|
|
|
grouping contiguous characters into a token rather than one token
|
|
|
|
per character. This change also changes the token type to "<CJ>"
|
|
|
|
for Chinese and Japanese character tokens (previously it was "<CJK>").
|
2005-11-12 03:39:00 -05:00
|
|
|
(Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
|
2005-10-23 07:41:00 -04:00
|
|
|
|
2005-11-12 03:39:00 -05:00
|
|
|
13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
|
|
|
|
FieldInfo.storePositionWithTermVector and creates the Field with
|
|
|
|
correct TermVector parameter.
|
|
|
|
(Frank Steinmann via Bernhard, LUCENE-455)
|
2005-11-11 20:08:01 -05:00
|
|
|
|
2005-11-12 03:39:00 -05:00
|
|
|
14. Fixed WildcardQuery to prevent "cat" matching "ca??".
|
|
|
|
(Xiaozheng Ma via Bernhard, LUCENE-306)
|
2005-11-10 00:57:32 -05:00
|
|
|
|
|
|
|
15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
|
|
|
|
change the sort order when sorting by string for documents without
|
|
|
|
a value for the sort field.
|
|
|
|
(Luc Vanlerberghe via Yonik, LUCENE-453)
|
2005-11-10 22:13:10 -05:00
|
|
|
|
|
|
|
16. Fixed a sorting problem with MultiSearchers that can lead to
|
|
|
|
missing or duplicate docs due to equal docs sorting in an arbitrary order.
|
|
|
|
(Yonik Seeley, LUCENE-456)
|
2005-11-11 16:19:02 -05:00
|
|
|
|
|
|
|
17. A single hit using the expert level sorted search methods
|
|
|
|
resulted in the score not being normalized.
|
|
|
|
(Yonik Seeley, LUCENE-462)
|
2005-12-02 12:37:50 -05:00
|
|
|
|
|
|
|
18. Fixed inefficient memory usage when loading an index into RAMDirectory.
|
2005-12-04 18:07:42 -05:00
|
|
|
(Volodymyr Bychkoviak via Bernhard, LUCENE-475)
|
|
|
|
|
|
|
|
19. Corrected term offsets returned by ChineseTokenizer.
|
|
|
|
(Ray Tsang via Erik Hatcher, LUCENE-324)
|
|
|
|
|
2005-12-08 14:53:06 -05:00
|
|
|
20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
|
|
|
|
(Robert Kirchgessner via Doug Cutting, LUCENE-479)
|
|
|
|
|
2005-12-22 21:38:23 -05:00
|
|
|
21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
|
|
|
|
fixed by aquiring the commit lock.
|
|
|
|
(Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
|
|
|
|
|
2006-01-25 17:49:45 -05:00
|
|
|
22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
|
|
|
|
this has now been fixed. (Daniel Naber)
|
|
|
|
|
2006-01-29 15:51:43 -05:00
|
|
|
23. Fixed QueryParser when called with a date in local form like
|
|
|
|
"[1/16/2000 TO 1/18/2000]". This query did not include the documents
|
|
|
|
of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
|
|
|
|
|
2006-02-13 16:46:13 -05:00
|
|
|
24. Removed sorting constraint that threw an exception if there were
|
|
|
|
not yet any values for the sort field (Yonik Seeley, LUCENE-374)
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
Optimizations
|
|
|
|
|
|
|
|
1. Disk usage (peak requirements during indexing and optimization)
|
|
|
|
in case of compound file format has been improved.
|
|
|
|
(Bernhard, Dmitry, and Christoph)
|
|
|
|
|
|
|
|
2. Optimize the performance of certain uses of BooleanScorer,
|
2004-09-29 12:54:44 -04:00
|
|
|
TermScorer and IndexSearcher. In particular, a BooleanQuery
|
|
|
|
composed of TermQuery, with not all terms required, that returns a
|
|
|
|
TopDocs (e.g., through a Hits with no Sort specified) runs much
|
|
|
|
faster. (cutting)
|
2004-09-30 05:19:43 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
3. Removed synchronization from reading of term vectors with an
|
2004-10-06 08:15:05 -04:00
|
|
|
IndexReader (Patch #30736). (Bernhard Messer via Christoph)
|
2004-09-20 14:14:25 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
4. Optimize term-dictionary lookup to allocate far fewer terms when
|
2004-10-08 11:58:49 -04:00
|
|
|
scanning for the matching term. This speeds searches involving
|
|
|
|
low-frequency terms, where the cost of dictionary lookup can be
|
|
|
|
significant. (cutting)
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
|
2004-11-07 18:31:16 -05:00
|
|
|
of 0 now run 20-50% faster (Patch #31882).
|
|
|
|
(Jonathan Hager via Daniel Naber)
|
2005-01-24 15:31:56 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
6. A Version of BooleanScorer (BooleanScorer2) added that delivers
|
|
|
|
documents in increasing order and implements skipTo. For queries
|
|
|
|
with required or forbidden clauses it may be faster than the old
|
|
|
|
BooleanScorer, for BooleanQueries consisting only of optional
|
|
|
|
clauses it is probably slower. The new BooleanScorer is now the
|
2005-01-24 15:31:56 -05:00
|
|
|
default. (Patch 31785 by Paul Elschot via Christoph)
|
2004-11-07 18:31:16 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
7. Use uncached access to norms when merging to reduce RAM usage.
|
|
|
|
(Bug #32847). (Doug Cutting)
|
|
|
|
|
2005-03-07 15:28:04 -05:00
|
|
|
8. Don't read term index when random-access is not required. This
|
|
|
|
reduces time to open IndexReaders and they use less memory when
|
|
|
|
random access is not required, e.g., when merging segments. The
|
|
|
|
term index is now read into memory lazily at the first
|
|
|
|
random-access. (Doug Cutting)
|
|
|
|
|
2005-06-02 13:05:58 -04:00
|
|
|
9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
|
|
|
|
added indexes is larger than mergeFactor. Previously this could
|
|
|
|
result in quadratic performance. Now performance is n log(n).
|
|
|
|
(Doug Cutting)
|
|
|
|
|
2005-10-27 23:45:35 -04:00
|
|
|
10. Speed up the creation of TermEnum for indicies with multiple
|
|
|
|
segments and deleted documents, and thus speed up PrefixQuery,
|
|
|
|
RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
|
|
|
|
and sorting the first time on a field.
|
|
|
|
(Yonik Seeley, LUCENE-454)
|
|
|
|
|
2005-11-25 23:14:35 -05:00
|
|
|
11. Optimized and generalized 32 bit floating point to byte
|
|
|
|
(custom 8 bit floating point) conversions. Increased the speed of
|
|
|
|
Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
|
|
|
|
(Yonik Seeley, LUCENE-467)
|
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
Infrastructure
|
2005-02-02 15:29:56 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
1. Lucene's source code repository has converted from CVS to
|
|
|
|
Subversion. The new repository is at
|
|
|
|
http://svn.apache.org/repos/asf/lucene/java/trunk
|
2004-11-07 18:31:16 -05:00
|
|
|
|
2005-09-21 06:41:16 -04:00
|
|
|
2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
|
|
|
|
Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
|
2005-10-23 07:41:00 -04:00
|
|
|
The old issues are still available at
|
|
|
|
http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
|
|
|
|
(use the bug number instead of xxxx)
|
2005-09-21 06:41:16 -04:00
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
|
2004-12-06 15:04:01 -05:00
|
|
|
1.4.3
|
|
|
|
|
|
|
|
1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
|
|
|
|
messages which might contain user input (e.g. error messages about
|
|
|
|
query parsing). If you used that page as a starting point for your
|
|
|
|
own code please make sure your code also properly escapes HTML
|
|
|
|
characters from user input in order to avoid so-called cross site
|
|
|
|
scripting attacks. (Daniel Naber)
|
|
|
|
|
|
|
|
2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
|
|
|
|
API is supported again. (Christoph)
|
|
|
|
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
1.4.2
|
|
|
|
|
|
|
|
1. Fixed bug #31241: Sorting could lead to incorrect results (documents
|
|
|
|
missing, others duplicated) if the sort keys were not unique and there
|
|
|
|
were more than 100 matches. (Daniel Naber)
|
|
|
|
|
|
|
|
2. Memory leak in Sort code (bug #31240) eliminated.
|
|
|
|
(Rafal Krzewski via Christoph and Daniel)
|
|
|
|
|
|
|
|
3. FuzzyQuery now takes an additional parameter that specifies the
|
|
|
|
minimum similarity that is required for a term to match the query.
|
|
|
|
The QueryParser syntax for this is term~x, where x is a floating
|
|
|
|
point number >= 0 and < 1 (a bigger number means that a higher
|
|
|
|
similarity is required). Furthermore, a prefix can be specified
|
|
|
|
for FuzzyQuerys so that only those terms are considered similar that
|
|
|
|
start with this prefix. This can speed up FuzzyQuery greatly.
|
|
|
|
(Daniel Naber, Christoph Goller)
|
|
|
|
|
|
|
|
4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
|
|
|
|
of relative positions. (Christoph Goller)
|
|
|
|
|
|
|
|
5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
|
|
|
|
(patch #9110); some unused method parameters removed; The ability
|
|
|
|
to specify a minimum similarity for FuzzyQuery has been added.
|
|
|
|
(Christoph Goller)
|
|
|
|
|
|
|
|
6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
|
|
|
|
for every non-zero-scoring hit. This makes 'OR' queries that
|
|
|
|
contain common terms substantially faster. (cutting)
|
|
|
|
|
|
|
|
|
2004-08-02 16:53:14 -04:00
|
|
|
1.4.1
|
2004-07-21 15:05:46 -04:00
|
|
|
|
|
|
|
1. Fixed a performance bug in hit sorting code, where values were not
|
|
|
|
correctly cached. (Aviran via cutting)
|
|
|
|
|
2004-08-07 07:36:39 -04:00
|
|
|
2. Fixed errors in file format documentation. (Daniel Naber)
|
2004-08-02 16:53:14 -04:00
|
|
|
|
2004-07-21 15:05:46 -04:00
|
|
|
|
2004-07-01 13:40:41 -04:00
|
|
|
1.4 final
|
2004-05-20 12:38:58 -04:00
|
|
|
|
|
|
|
1. Added "an" to the list of stop words in StopAnalyzer, to complement
|
|
|
|
the existing "a" there. Fix for bug 28960
|
|
|
|
(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
|
|
|
|
|
2004-05-20 13:16:56 -04:00
|
|
|
2. Added new class FieldCache to manage in-memory caches of field term
|
|
|
|
values. (Tim Jones)
|
|
|
|
|
2004-05-22 13:34:31 -04:00
|
|
|
3. Added overloaded getFieldQuery method to QueryParser which
|
|
|
|
accepts the slop factor specified for the phrase (or the default
|
|
|
|
phrase slop for the QueryParser instance). This allows overriding
|
|
|
|
methods to replace a PhraseQuery with a SpanNearQuery instead,
|
|
|
|
keeping the proper slop factor. (Erik Hatcher)
|
|
|
|
|
2004-05-30 16:24:20 -04:00
|
|
|
4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
|
|
|
|
UTF-8 and changed the build encoding to UTF-8, to make changed files
|
|
|
|
compile. (Otis Gospodnetic)
|
|
|
|
|
2004-06-07 12:55:52 -04:00
|
|
|
5. Removed synchronization from term lookup under IndexReader methods
|
|
|
|
termFreq(), termDocs() or termPositions() to improve
|
|
|
|
multi-threaded performance. (cutting)
|
|
|
|
|
2004-06-09 07:28:46 -04:00
|
|
|
6. Fix a bug where obsolete segment files were not deleted on Win32.
|
|
|
|
|
2004-05-22 13:34:31 -04:00
|
|
|
|
2004-04-27 18:04:50 -04:00
|
|
|
1.4 RC3
|
|
|
|
|
|
|
|
1. Fixed several search bugs introduced by the skipTo() changes in
|
2004-05-11 16:12:43 -04:00
|
|
|
release 1.4RC1. The index file format was changed a bit, so
|
|
|
|
collections must be re-indexed to take advantage of the skipTo()
|
|
|
|
optimizations. (Christoph Goller)
|
2004-04-27 18:04:50 -04:00
|
|
|
|
|
|
|
2. Added new Document methods, removeField() and removeFields().
|
|
|
|
(Christoph Goller)
|
|
|
|
|
|
|
|
3. Fixed inconsistencies with index closing. Indexes and directories
|
|
|
|
are now only closed automatically by Lucene when Lucene opened
|
|
|
|
them automatically. (Christoph Goller)
|
|
|
|
|
|
|
|
4. Added new class: FilteredQuery. (Tim Jones)
|
|
|
|
|
|
|
|
5. Added a new SortField type for custom comparators. (Tim Jones)
|
|
|
|
|
2004-05-09 08:52:00 -04:00
|
|
|
6. Lock obtain timed out message now displays the full path to the lock
|
|
|
|
file. (Daniel Naber via Erik)
|
|
|
|
|
2004-05-11 16:12:43 -04:00
|
|
|
7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
|
|
|
|
|
|
|
|
8. Fixed so that FSDirectory's locks still work when the
|
|
|
|
java.io.tmpdir system property is null. (cutting)
|
|
|
|
|
2004-05-20 12:28:15 -04:00
|
|
|
9. Changed FilteredTermEnum's constructor to take no parameters,
|
|
|
|
as the parameters were ignored anyway (bug #28858)
|
2004-04-27 18:04:50 -04:00
|
|
|
|
2004-03-30 12:40:08 -05:00
|
|
|
1.4 RC2
|
|
|
|
|
2004-04-22 04:30:44 -04:00
|
|
|
1. GermanAnalyzer now throws an exception if the stopword file
|
|
|
|
cannot be found (bug #27987). It now uses LowerCaseFilter
|
|
|
|
(bug #18410) (Daniel Naber via Otis, Erik)
|
2004-03-30 12:40:08 -05:00
|
|
|
|
2004-03-30 12:48:45 -05:00
|
|
|
2. Fixed a few bugs in the file format documentation. (cutting)
|
|
|
|
|
2004-03-30 12:40:08 -05:00
|
|
|
|
2004-01-15 17:42:35 -05:00
|
|
|
1.4 RC1
|
|
|
|
|
|
|
|
1. Changed the format of the .tis file, so that:
|
|
|
|
|
|
|
|
- it has a format version number, which makes it easier to
|
|
|
|
back-compatibly change file formats in the future.
|
|
|
|
|
|
|
|
- the term count is now stored as a long. This was the one aspect
|
|
|
|
of the Lucene's file formats which limited index size.
|
|
|
|
|
|
|
|
- a few internal index parameters are now stored in the index, so
|
|
|
|
that they can (in theory) now be changed from index to index,
|
|
|
|
although there is not yet an API to do so.
|
|
|
|
|
|
|
|
These changes are back compatible. The new code can read old
|
|
|
|
indexes. But old code will not be able read new indexes. (cutting)
|
|
|
|
|
|
|
|
2. Added an optimized implementation of TermDocs.skipTo(). A skip
|
|
|
|
table is now stored for each term in the .frq file. This only
|
|
|
|
adds a percent or two to overall index size, but can substantially
|
|
|
|
speedup many searches. (cutting)
|
|
|
|
|
|
|
|
3. Restructured the Scorer API and all Scorer implementations to take
|
|
|
|
advantage of an optimized TermDocs.skipTo() implementation. In
|
|
|
|
particular, PhraseQuerys and conjunctive BooleanQuerys are
|
|
|
|
faster when one clause has substantially fewer matches than the
|
|
|
|
others. (A conjunctive BooleanQuery is a BooleanQuery where all
|
|
|
|
clauses are required.) (cutting)
|
|
|
|
|
2004-01-20 13:37:09 -05:00
|
|
|
4. Added new class ParallelMultiSearcher. Combined with
|
|
|
|
RemoteSearchable this makes it easy to implement distributed
|
|
|
|
search systems. (Jean-Francois Halleux via cutting)
|
|
|
|
|
2004-02-17 14:00:31 -05:00
|
|
|
5. Added support for hit sorting. Results may now be sorted by any
|
|
|
|
indexed field. For details see the javadoc for
|
|
|
|
Searcher#search(Query, Sort). (Tim Jones via Cutting)
|
2004-01-30 12:07:53 -05:00
|
|
|
|
|
|
|
6. Changed FSDirectory to auto-create a full directory tree that it
|
|
|
|
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
|
|
|
|
|
2004-01-30 17:10:00 -05:00
|
|
|
7. Added a new span-based query API. This implements, among other
|
|
|
|
things, nested phrases. See javadocs for details. (Doug Cutting)
|
|
|
|
|
2004-02-06 14:19:20 -05:00
|
|
|
8. Added new method Query.getSimilarity(Searcher), and changed
|
|
|
|
scorers to use it. This permits one to subclass a Query class so
|
2006-07-29 05:54:48 -04:00
|
|
|
that it can specify its own Similarity implementation, perhaps
|
2004-02-06 15:56:45 -05:00
|
|
|
one that delegates through that of the Searcher. (Julien Nioche
|
|
|
|
via Cutting)
|
2004-02-06 14:19:20 -05:00
|
|
|
|
2004-02-19 13:28:59 -05:00
|
|
|
9. Added MultiReader, an IndexReader that combines multiple other
|
|
|
|
IndexReaders. (Cutting)
|
2004-01-15 17:42:35 -05:00
|
|
|
|
2004-02-20 15:14:56 -05:00
|
|
|
10. Added support for term vectors. See Field#isTermVectorStored().
|
|
|
|
(Grant Ingersoll, Cutting & Dmitry)
|
|
|
|
|
2004-03-03 06:24:49 -05:00
|
|
|
11. Fixed the old bug with escaping of special characters in query
|
|
|
|
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
|
|
|
|
(Jean-Francois Halleux via Otis)
|
|
|
|
|
2004-03-18 14:05:18 -05:00
|
|
|
12. Added support for overriding default values for the following,
|
|
|
|
using system properties:
|
|
|
|
- default commit lock timeout
|
|
|
|
- default maxFieldLength
|
|
|
|
- default maxMergeDocs
|
|
|
|
- default mergeFactor
|
|
|
|
- default minMergeDocs
|
|
|
|
- default write lock timeout
|
|
|
|
(Otis)
|
|
|
|
|
2004-03-24 05:12:27 -05:00
|
|
|
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
|
|
|
|
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
|
|
|
|
(Morus Walter via Otis)
|
|
|
|
|
2004-03-24 13:10:59 -05:00
|
|
|
14. Changed so that the compound index format is used by default.
|
|
|
|
This makes indexing a bit slower, but vastly reduces the chances
|
|
|
|
of file handle problems. (Cutting)
|
|
|
|
|
2004-02-20 15:14:56 -05:00
|
|
|
|
2003-12-26 13:05:27 -05:00
|
|
|
1.3 final
|
2003-11-26 06:10:54 -05:00
|
|
|
|
|
|
|
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
|
|
|
|
throw ParseException instead. (Erik Hatcher)
|
|
|
|
|
2003-12-15 18:04:42 -05:00
|
|
|
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
|
|
|
|
|
|
|
|
3. Added a new method IndexReader.setNorm(), that permits one to
|
|
|
|
alter the boosting of fields after an index is created.
|
|
|
|
|
2003-12-22 16:42:48 -05:00
|
|
|
4. Distinguish between the final position and length when indexing a
|
|
|
|
field. The length is now defined as the total number of tokens,
|
|
|
|
instead of the final position, as it was previously. Length is
|
|
|
|
used for score normalization (Similarity.lengthNorm()) and for
|
|
|
|
controlling memory usage (IndexWriter.maxFieldLength). In both of
|
|
|
|
these cases, the total number of tokens is a better value to use
|
|
|
|
than the final token position. Position is used in phrase
|
|
|
|
searching (see PhraseQuery and Token.setPositionIncrement()).
|
|
|
|
|
2003-12-22 17:12:24 -05:00
|
|
|
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
|
|
|
|
Japanese and Korean ideograms). Previously contiguous sequences
|
|
|
|
were combined in a single token, which is not very useful. Now
|
|
|
|
each ideogram generates a separate token, which is more useful.
|
|
|
|
|
2003-12-15 18:04:42 -05:00
|
|
|
|
2003-11-18 07:00:12 -05:00
|
|
|
1.3 RC3
|
|
|
|
|
2003-11-25 16:56:08 -05:00
|
|
|
1. Added minMergeDocs in IndexWriter. This can be raised to speed
|
|
|
|
indexing without altering the number of files, but only using more
|
|
|
|
memory. (Julien Nioche via Otis)
|
|
|
|
|
|
|
|
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
|
|
|
|
|
|
|
|
3. Fix bug #16952, in demo HTML parser, skip comments in
|
|
|
|
javascript. (Christoph Goller)
|
|
|
|
|
|
|
|
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
|
|
|
|
output (Daniel Naber via Christoph Goller)
|
|
|
|
|
|
|
|
5. Fix bug #24301, in demo HTML parser, long titles no longer
|
|
|
|
hang things. (Christoph Goller)
|
|
|
|
|
|
|
|
6. Fix bug #23534, Replace use of file timestamp of segments file
|
|
|
|
with an index version number stored in the segments file. This
|
|
|
|
resolves problems when running on file systems with low-resolution
|
|
|
|
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
|
|
|
|
|
|
|
|
7. Fix QueryParser so that TokenMgrError is not thrown, only
|
|
|
|
ParseException. (Erik Hatcher)
|
|
|
|
|
|
|
|
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
|
|
|
|
|
|
|
|
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
|
|
|
|
|
|
|
|
10. Cleaned up some build stuff. (Erik Hatcher)
|
2003-11-18 07:00:12 -05:00
|
|
|
|
|
|
|
|
2003-03-20 13:38:59 -05:00
|
|
|
1.3 RC2
|
|
|
|
|
2003-04-30 21:09:15 -04:00
|
|
|
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
|
2003-09-18 05:47:06 -04:00
|
|
|
SegmentsReader. (Julien Nioche via otis)
|
2003-03-20 13:38:59 -05:00
|
|
|
|
2003-05-01 15:50:18 -04:00
|
|
|
2. Changed file locking to place lock files in
|
|
|
|
System.getProperty("java.io.tmpdir"), where all users are
|
|
|
|
permitted to write files. This way folks can open and correctly
|
|
|
|
lock indexes which are read-only to them.
|
|
|
|
|
2003-07-11 18:13:13 -04:00
|
|
|
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
|
|
|
|
permitting one to easily use different analyzers for different
|
|
|
|
documents in the same index.
|
|
|
|
|
2003-09-11 07:50:36 -04:00
|
|
|
4. Minor enhancements to FuzzyTermEnum.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
|
|
|
|
and MultiIndexSearcher to use it.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
6. Fixed a bug in IndexWriter that returned incorrect docCount().
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
7. Fixed SegmentsReader to eliminate the confusing and slightly different
|
|
|
|
behaviour of TermEnum when dealing with an enumeration of all terms,
|
|
|
|
versus an enumeration starting from a specific term.
|
|
|
|
This patch also fixes incorrect term document frequences when the same term
|
|
|
|
is present in multiple segments.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
2003-10-03 08:37:51 -04:00
|
|
|
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
|
|
|
|
|
|
|
|
9. Added support for the new "compound file" index format (Dmitry
|
|
|
|
Serebrennikov)
|
|
|
|
|
2003-10-03 11:16:24 -04:00
|
|
|
10. Added Locale setting to QueryParser, for use by date range parsing.
|
2003-09-18 05:47:06 -04:00
|
|
|
|
2003-10-21 13:59:17 -04:00
|
|
|
11. Changed IndexReader so that it can be subclassed by classes
|
|
|
|
outside of its package. Previously it had package-private
|
|
|
|
abstract methods. Also modified the index merging code so that it
|
|
|
|
can work on an arbitrary IndexReader implementation, and added a
|
|
|
|
new method, IndexWriter.addIndexes(IndexReader[]), to take
|
|
|
|
advantage of this. (cutting)
|
|
|
|
|
|
|
|
12. Added a limit to the number of clauses which may be added to a
|
|
|
|
BooleanQuery. The default limit is 1024 clauses. This should
|
|
|
|
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
|
|
|
|
queries which run amok. (cutting)
|
|
|
|
|
2003-10-21 14:24:23 -04:00
|
|
|
13. Add new method: IndexReader.undeleteAll(). This undeletes all
|
|
|
|
deleted documents which still remain in the index. (cutting)
|
|
|
|
|
2003-03-20 13:38:59 -05:00
|
|
|
|
2003-03-20 13:15:04 -05:00
|
|
|
1.3 RC1
|
2002-06-02 15:15:18 -04:00
|
|
|
|
2002-06-05 14:42:46 -04:00
|
|
|
1. Fixed PriorityQueue's clear() method.
|
|
|
|
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
|
|
|
|
(Matthijs Bomhoff via otis)
|
|
|
|
|
|
|
|
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
|
|
|
|
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
|
|
|
|
(Dale Anson via otis)
|
2002-05-14 11:24:05 -04:00
|
|
|
|
2002-06-27 12:30:20 -04:00
|
|
|
3. Added the ability to disable lock creation by using disableLuceneLocks
|
|
|
|
system property. This is useful for read-only media, such as CD-ROMs.
|
2002-06-21 12:41:38 -04:00
|
|
|
(otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-06-21 12:54:00 -04:00
|
|
|
4. Added id method to Hits to be able to access the index global id.
|
2002-06-29 18:10:37 -04:00
|
|
|
Required for sorting options.
|
|
|
|
(carlson)
|
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
5. Added support for new range query syntax to QueryParser.jj.
|
|
|
|
(briangoetz)
|
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
6. Added the ability to retrieve HTML documents' META tag values to
|
|
|
|
HTMLParser.jj.
|
2002-06-29 18:10:37 -04:00
|
|
|
(Mark Harwood via otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-07-14 23:58:06 -04:00
|
|
|
7. Modified QueryParser to make it possible to programmatically specify the
|
|
|
|
default Boolean operator (OR or AND).
|
2004-05-24 15:05:21 -04:00
|
|
|
(Péter Halácsy via otis)
|
2002-07-14 23:58:06 -04:00
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
8. Made many search methods and classes non-final, per requests.
|
|
|
|
This includes IndexWriter and IndexSearcher, among others.
|
|
|
|
(cutting)
|
2002-07-25 02:11:35 -04:00
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
9. Added class RemoteSearchable, providing support for remote
|
|
|
|
searching via RMI. The test class RemoteSearchableTest.java
|
|
|
|
provides an example of how this can be used. (cutting)
|
|
|
|
|
2002-07-18 10:40:51 -04:00
|
|
|
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
|
|
|
|
test class TestPhrasePrefixQuery provides the usage example.
|
|
|
|
(Anders Nielsen via otis)
|
2002-07-17 19:26:26 -04:00
|
|
|
|
2002-07-26 13:32:54 -04:00
|
|
|
11. Changed the German stemming algorithm to ignore case while
|
|
|
|
stripping. The new algorithm is faster and produces more equal
|
|
|
|
stems from nouns and verbs derived from the same word.
|
2002-07-25 02:11:35 -04:00
|
|
|
(gschwarz)
|
|
|
|
|
2002-07-29 15:11:15 -04:00
|
|
|
12. Added support for boosting the score of documents and fields via
|
|
|
|
the new methods Document.setBoost(float) and Field.setBoost(float).
|
|
|
|
|
|
|
|
Note: This changes the encoding of an indexed value. Indexes
|
|
|
|
should be re-created from scratch in order for search scores to
|
|
|
|
be correct. With the new code and an old index, searches will
|
|
|
|
yield very large scores for shorter fields, and very small scores
|
|
|
|
for longer fields. Once the index is re-created, scores will be
|
|
|
|
as before. (cutting)
|
|
|
|
|
2002-08-05 13:39:03 -04:00
|
|
|
13. Added new method Token.setPositionIncrement().
|
|
|
|
|
|
|
|
This permits, for the purpose of phrase searching, placing
|
|
|
|
multiple terms in a single position. This is useful with
|
|
|
|
stemmers that produce multiple possible stems for a word.
|
|
|
|
|
|
|
|
This also permits the introduction of gaps between terms, so that
|
|
|
|
terms which are adjacent in a token stream will not be matched by
|
|
|
|
and exact phrase query. This makes it possible, e.g., to build
|
|
|
|
an analyzer where phrases are not matched over stop words which
|
|
|
|
have been removed.
|
|
|
|
|
|
|
|
Finally, repeating a token with an increment of zero can also be
|
2002-08-05 14:05:56 -04:00
|
|
|
used to boost scores of matches on that token. (cutting)
|
|
|
|
|
|
|
|
14. Added new Filter class, QueryFilter. This constrains search
|
|
|
|
results to only match those which also match a provided query.
|
|
|
|
Results are cached, so that searches after the first on the same
|
|
|
|
index using this filter are very fast.
|
|
|
|
|
|
|
|
This could be used, for example, with a RangeQuery on a formatted
|
|
|
|
date field to implement date filtering. One could re-use a
|
|
|
|
single QueryFilter that matches, e.g., only documents modified
|
|
|
|
within the last week. The QueryFilter and RangeQuery would only
|
|
|
|
need to be reconstructed once per day. (cutting)
|
2002-08-05 13:39:03 -04:00
|
|
|
|
2002-08-07 12:28:08 -04:00
|
|
|
15. Added a new IndexWriter method, getAnalyzer(). This returns the
|
2002-08-08 13:56:19 -04:00
|
|
|
analyzer used when adding documents to this index. (cutting)
|
|
|
|
|
|
|
|
16. Fixed a bug with IndexReader.lastModified(). Before, document
|
|
|
|
deletion did not update this. Now it does. (cutting)
|
2002-08-07 12:28:08 -04:00
|
|
|
|
2002-09-16 00:11:36 -04:00
|
|
|
17. Added Russian Analyzer.
|
|
|
|
(Boris Okner via otis)
|
|
|
|
|
2002-11-07 12:31:27 -05:00
|
|
|
18. Added a public, extensible scoring API. For details, see the
|
|
|
|
javadoc for org.apache.lucene.search.Similarity.
|
2003-01-13 22:41:05 -05:00
|
|
|
|
2002-11-15 11:09:48 -05:00
|
|
|
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
|
2002-11-07 12:31:27 -05:00
|
|
|
|
2003-01-04 12:29:40 -05:00
|
|
|
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
|
2003-01-04 12:16:07 -05:00
|
|
|
(Peter Mularien via otis)
|
2002-07-29 15:11:15 -04:00
|
|
|
|
2003-01-07 11:11:00 -05:00
|
|
|
21. Added getFields(String) and getValues(String) methods.
|
2003-01-13 22:41:05 -05:00
|
|
|
Contributed by Rasik Pandey on 2002-10-09
|
2003-01-07 11:11:00 -05:00
|
|
|
(Rasik Pandey via otis)
|
|
|
|
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
22. Revised internal search APIs. Changes include:
|
|
|
|
|
|
|
|
a. Queries are no longer modified during a search. This makes
|
|
|
|
it possible, e.g., to reuse the same query instance with
|
|
|
|
multiple indexes from multiple threads.
|
|
|
|
|
|
|
|
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
|
|
|
|
etc.) now work correctly with MultiSearcher, fixing bugs 12619
|
|
|
|
and 12667.
|
|
|
|
|
|
|
|
c. Boosting BooleanQuery's now works, and is supported by the
|
|
|
|
query parser (problem reported by Lee Mallabone). Thus a query
|
|
|
|
like "(+foo +bar)^2 +baz" is now supported and equivalent to
|
|
|
|
"(+foo^2 +bar^2) +baz".
|
|
|
|
|
|
|
|
d. New method: Query.rewrite(IndexReader). This permits a
|
|
|
|
query to re-write itself as an alternate, more primitive query.
|
|
|
|
Most of the term-expanding query classes (PrefixQuery,
|
|
|
|
WildcardQuery, etc.) are now implemented using this method.
|
|
|
|
|
|
|
|
e. New method: Searchable.explain(Query q, int doc). This
|
|
|
|
returns an Explanation instance that describes how a particular
|
|
|
|
document is scored against a query. An explanation can be
|
|
|
|
displayed as either plain text, with the toString() method, or
|
|
|
|
as HTML, with the toHtml() method. Note that computing an
|
|
|
|
explanation is as expensive as executing the query over the
|
|
|
|
entire index. This is intended to be used in developing
|
|
|
|
Similarity implementations, and, for good performance, should
|
|
|
|
not be displayed with every hit.
|
|
|
|
|
|
|
|
f. Scorer and Weight are public, not package protected. It now
|
|
|
|
possible for someone to write a Scorer implementation that is
|
|
|
|
not in the org.apache.lucene.search package. This is still
|
|
|
|
fairly advanced programming, and I don't expect anyone to do
|
|
|
|
this anytime soon, but at least now it is possible.
|
|
|
|
|
2003-01-14 14:20:30 -05:00
|
|
|
g. Added public accessors to the primitive query classes
|
|
|
|
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
|
|
|
|
their terms and clauses.
|
|
|
|
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
Caution: These are extensive changes and they have not yet been
|
|
|
|
tested extensively. Bug reports are appreciated.
|
2003-01-13 22:41:05 -05:00
|
|
|
(cutting)
|
|
|
|
|
|
|
|
23. Added convenience RAMDirectory constructors taking File and String
|
|
|
|
arguments, for easy FSDirectory to RAMDirectory conversion.
|
2003-01-14 00:03:19 -05:00
|
|
|
(otis)
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
|
2003-03-01 20:22:36 -05:00
|
|
|
24. Added code for manual renaming of files in FSDirectory, since it
|
|
|
|
has been reported that java.io.File's renameTo(File) method sometimes
|
|
|
|
fails on Windows JVMs.
|
|
|
|
(Matt Tucker via otis)
|
|
|
|
|
2003-03-01 21:48:45 -05:00
|
|
|
25. Refactored QueryParser to make it easier for people to extend it.
|
|
|
|
Added the ability to automatically lower-case Wildcard terms in
|
|
|
|
the QueryParser.
|
|
|
|
(Tatu Saloranta via otis)
|
|
|
|
|
2003-01-07 11:11:00 -05:00
|
|
|
|
2002-05-14 11:24:05 -04:00
|
|
|
1.2 RC6
|
|
|
|
|
2002-05-20 11:45:43 -04:00
|
|
|
1. Changed QueryParser.jj to have "?" be a special character which
|
2002-06-29 18:34:09 -04:00
|
|
|
allowed it to be used as a wildcard term. Updated TestWildcard
|
2002-05-20 11:45:43 -04:00
|
|
|
unit test also. (Ralf Hettesheimer via carlson)
|
2002-05-14 11:24:05 -04:00
|
|
|
|
2002-02-14 16:17:17 -05:00
|
|
|
1.2 RC5
|
|
|
|
|
2002-02-27 17:18:28 -05:00
|
|
|
1. Renamed build.properties to default.properties and updated
|
|
|
|
the BUILD.txt document to describe how to override the
|
|
|
|
default.property settings without having to edit the file. This
|
|
|
|
brings the build process closer to Scarab's build process.
|
|
|
|
(jon)
|
2002-02-14 16:17:17 -05:00
|
|
|
|
2002-02-27 18:05:15 -05:00
|
|
|
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
|
|
|
|
|
|
|
|
3. Updated "powered by" links. (otis)
|
|
|
|
|
2002-05-04 14:26:27 -04:00
|
|
|
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
|
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
5. Added throwing exception if FSDirectory could not create diectory
|
2002-05-06 14:10:55 -04:00
|
|
|
- Bug #6914 (Eugene Gluzberg via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
|
2002-05-06 14:10:55 -04:00
|
|
|
LowerCaseTokenizer javadoc (otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
7. Added fix to avoid NullPointerException in results.jsp
|
|
|
|
(Mark Hayes via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
8. Changed Wildcard search to find 0 or more char instead of 1 or more
|
|
|
|
(Lee Mallobone, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
|
|
|
|
(Rodrigo Reyes, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
|
|
|
10. Added unit tests for wildcard search and DateFilter (otis)
|
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
11. Allow co-existence of indexed and non-indexed fields with the same name
|
|
|
|
(cutting/casper, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
12. Add escape character to query parser.
|
2002-05-06 20:24:22 -04:00
|
|
|
(briangoetz)
|
|
|
|
|
2002-05-07 17:28:51 -04:00
|
|
|
13. Applied a patch that ensures that searches that use DateFilter
|
|
|
|
don't throw an exception when no matches are found. (David Smiley, via
|
|
|
|
otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-05-11 09:32:30 -04:00
|
|
|
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
|
|
|
|
2002-02-08 18:01:15 -05:00
|
|
|
1.2 RC4
|
|
|
|
|
2002-02-14 14:31:25 -05:00
|
|
|
1. Updated contributions section of website.
|
|
|
|
Add XML Document #3 implementation to Document Section.
|
|
|
|
Also added Term Highlighting to Misc Section. (carlson)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
|
|
|
2. Fixed NullPointerException for phrase searches containing
|
2002-02-14 14:31:25 -05:00
|
|
|
unindexed terms, introduced in 1.2RC3. (cutting)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
|
|
|
3. Changed document deletion code to obtain the index write lock,
|
|
|
|
enforcing the fact that document addition and deletion cannot be
|
2002-02-14 14:31:25 -05:00
|
|
|
performed concurrently. (cutting)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
2002-02-14 14:31:25 -05:00
|
|
|
4. Various documentation cleanups. (otis, acoliver)
|
|
|
|
|
|
|
|
5. Updated "powered by" links. (cutting, jon)
|
|
|
|
|
|
|
|
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
|
|
|
|
|
|
|
|
7. Changed Term and Query to implement Serializable. (scottganyo)
|
|
|
|
|
|
|
|
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
|
|
|
|
(cutting)
|
|
|
|
|
|
|
|
9. Upgraded to JUnit 3.7. (otis)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
1.2 RC3
|
|
|
|
|
|
|
|
1. IndexWriter: fixed a bug where adding an optimized index to an
|
|
|
|
empty index failed. This was encountered using addIndexes to copy
|
|
|
|
a RAMDirectory index to an FSDirectory.
|
|
|
|
|
|
|
|
2. RAMDirectory: fixed a bug where RAMInputStream could not read
|
|
|
|
across more than across a single buffer boundary.
|
|
|
|
|
|
|
|
3. Fix query parser so it accepts queries with unicode characters.
|
2002-01-27 17:27:48 -05:00
|
|
|
(briangoetz)
|
2002-02-27 18:05:15 -05:00
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
4. Fix query parser so that PrefixQuery is used in preference to
|
|
|
|
WildcardQuery when there's only an asterisk at the end of the
|
|
|
|
term. Previously PrefixQuery would never be used.
|
|
|
|
|
|
|
|
5. Fix tests so they compile; fix ant file so it compiles tests
|
|
|
|
properly. Added test cases for Analyzers and PriorityQueue.
|
|
|
|
|
2002-01-27 17:27:48 -05:00
|
|
|
6. Updated demos, added Getting Started documentation. (acoliver)
|
|
|
|
|
|
|
|
7. Added 'contributions' section to website & docs. (carlson)
|
|
|
|
|
|
|
|
8. Removed JavaCC from source distribution for copyright reasons.
|
|
|
|
Folks must now download this separately from metamata in order to
|
|
|
|
compile Lucene. (cutting)
|
|
|
|
|
|
|
|
9. Substantially improved the performance of DateFilter by adding the
|
|
|
|
ability to reuse TermDocs objects. (cutting)
|
|
|
|
|
|
|
|
10. Added IndexReader methods:
|
|
|
|
public static boolean indexExists(String directory);
|
|
|
|
public static boolean indexExists(File directory);
|
|
|
|
public static boolean indexExists(Directory directory);
|
|
|
|
public static boolean isLocked(Directory directory);
|
|
|
|
public static void unlock(Directory directory);
|
|
|
|
(cutting, otis)
|
|
|
|
|
|
|
|
11. Fixed bugs in GermanAnalyzer (gschwarz)
|
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
|
|
|
|
1.2 RC2, 19 October 2001:
|
|
|
|
- added sources to distribution
|
|
|
|
- removed broken build scripts and libraries from distribution
|
|
|
|
- SegmentsReader: fixed potential race condition
|
|
|
|
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
|
|
|
|
erases the directory contents, even when the directory
|
|
|
|
has already been accessed in this JVM.
|
|
|
|
- RangeQuery: Fix issue where an inclusive range query would
|
|
|
|
include the nearest term in the index above a non-existant
|
|
|
|
specified upper term.
|
|
|
|
- SegmentTermEnum: Fix NullPointerException in clone() method
|
|
|
|
when the Term is null.
|
|
|
|
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
|
|
|
|
since they rely on a feature added in JDK 1.2.
|
|
|
|
|
|
|
|
1.2 RC1 (first Apache release), 2 October 2001:
|
|
|
|
- packages renamed from com.lucene to org.apache.lucene
|
|
|
|
- license switched from LGPL to Apache
|
|
|
|
- ant-only build -- no more makefiles
|
|
|
|
- addition of lock files--now fully thread & process safe
|
|
|
|
- addition of German stemmer
|
|
|
|
- MultiSearcher now supports low-level search API
|
|
|
|
- added RangeQuery, for term-range searching
|
|
|
|
- Analyzers can choose tokenizer based on field name
|
|
|
|
- misc bug fixes.
|
|
|
|
|
|
|
|
1.01b (last Sourceforge release), 2 July 2001
|
|
|
|
. a few bug fixes
|
|
|
|
. new Query Parser
|
|
|
|
. new prefix query (search for "foo*" matches "food")
|
|
|
|
|
|
|
|
1.0, 2000-10-04
|
|
|
|
|
|
|
|
This release fixes a few serious bugs and also includes some
|
|
|
|
performance optimizations, a stemmer, and a few other minor
|
|
|
|
enhancements.
|
|
|
|
|
|
|
|
0.04 2000-04-19
|
|
|
|
|
|
|
|
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
|
|
|
|
|
|
|
|
The only tokenizer included in the previous release (LetterTokenizer)
|
|
|
|
identified terms consisting entirely of alphabetic characters. The
|
|
|
|
new tokenizer uses a regular-expression grammar to identify more
|
|
|
|
complex classes of terms, including numbers, acronyms, email
|
|
|
|
addresses, etc.
|
|
|
|
|
|
|
|
StandardTokenizer serves two purposes:
|
|
|
|
|
|
|
|
1. It is a much better, general purpose tokenizer for use by
|
|
|
|
applications as is.
|
|
|
|
|
|
|
|
The easiest way for applications to start using
|
|
|
|
StandardTokenizer is to use StandardAnalyzer.
|
|
|
|
|
|
|
|
2. It provides a good example of grammar-based tokenization.
|
|
|
|
|
|
|
|
If an application has special tokenization requirements, it can
|
|
|
|
implement a custom tokenizer by copying the directory containing
|
|
|
|
the new tokenizer into the application and modifying it
|
|
|
|
accordingly.
|
|
|
|
|
|
|
|
0.01, 2000-03-30
|
|
|
|
|
|
|
|
First open source release.
|
|
|
|
|
|
|
|
The code has been re-organized into a new package and directory
|
|
|
|
structure for this release. It builds OK, but has not been tested
|
|
|
|
beyond that since the re-organization.
|