2001-11-04 12:23:04 -05:00
|
|
|
Lucene Change Log
|
|
|
|
|
|
|
|
$Id$
|
|
|
|
|
2004-08-07 07:36:39 -04:00
|
|
|
1.9 RC1
|
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
Requirements
|
|
|
|
|
|
|
|
1. To compile and use Lucene you now need Java 1.4 or later.
|
|
|
|
|
2004-12-12 06:46:02 -05:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
|
|
|
|
FuzzyQuery expands to more than BooleanQuery.maxClauseCount
|
|
|
|
terms only the BooleanQuery.maxClauseCount most similar terms
|
|
|
|
go into the rewritten query and thus the exception is avoided.
|
|
|
|
(Christoph)
|
|
|
|
|
2005-03-07 14:26:27 -05:00
|
|
|
2. Changed system property from "org.apache.lucene.lockdir" to
|
2004-12-12 06:46:02 -05:00
|
|
|
"org.apache.lucene.lockDir", so that its casing follows the existing
|
|
|
|
pattern used in other Lucene system properties. (Bernhard)
|
|
|
|
|
2004-12-14 18:13:34 -05:00
|
|
|
3. The terms of RangeQueries and FuzzyQueries are now converted to
|
|
|
|
lowercase by default (as it has been the case for PrefixQueries
|
|
|
|
and WildcardQueries before). Use setLowercaseExpandedTerms(false)
|
|
|
|
to disable that behavior but note that this also affects
|
|
|
|
PrefixQueries and WildcardQueries. (Daniel Naber)
|
2005-04-19 23:28:32 -04:00
|
|
|
|
|
|
|
4. Document frequency that is computed when MultiSearcher is used is now
|
|
|
|
computed correctly and "globally" across subsearchers and indices, while
|
|
|
|
before it used to be computed locally to each index, which caused
|
|
|
|
ranking across multiple indices not to be equivalent.
|
2005-04-20 16:39:18 -04:00
|
|
|
(Chuck Williams, Wolf Siberski via Otis, bug #31841)
|
2005-04-19 23:28:32 -04:00
|
|
|
|
2005-05-04 19:34:52 -04:00
|
|
|
5. When opening an IndexWriter with create=true, Lucene now only deletes
|
|
|
|
its own files from the index directory (looking at the file name suffixes
|
|
|
|
to decide if a file belongs to Lucene). The old behavior was to delete
|
2005-06-10 03:30:15 -04:00
|
|
|
all files. (Daniel Naber and Bernhard Messer, bug #34695)
|
2005-06-07 18:13:44 -04:00
|
|
|
|
|
|
|
6. The version of an IndexReader, as returned by getCurrentVersion()
|
|
|
|
and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
|
|
|
|
is now initialized by the system time in milliseconds.
|
|
|
|
(Bernhard Messer via Daniel Naber)
|
2005-05-04 19:34:52 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
New features
|
|
|
|
|
|
|
|
1. Added support for stored compressed fields (patch #31149)
|
|
|
|
(Bernhard Messer via Christoph)
|
|
|
|
|
|
|
|
2. Added support for binary stored fields (patch #29370)
|
|
|
|
(Drew Farris and Bernhard Messer via Christoph)
|
2004-08-13 15:33:25 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
3. Added support for position and offset information in term vectors
|
|
|
|
(patch #18927). (Grant Ingersoll & Christoph)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
4. A new class DateTools has been added. It allows you to format dates
|
2004-09-05 18:09:26 -04:00
|
|
|
in a readable format adequate for indexing. Unlike the existing
|
|
|
|
DateField class DateTools can cope with dates before 1970 and it
|
|
|
|
forces you to specify the desired date resolution (e.g. month, day,
|
|
|
|
second, ...) which can make RangeQuerys on those fields more efficient.
|
|
|
|
(Daniel Naber)
|
2004-11-19 15:39:02 -05:00
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
5. QueryParser now correctly works with Analyzers that can return more
|
2004-11-19 15:39:02 -05:00
|
|
|
than one token per position. For example, a query "+fast +car"
|
|
|
|
would be parsed as "+fast +(car automobile)" if the Analyzer
|
|
|
|
returns "car" and "automobile" at the same position whenever it
|
|
|
|
finds "car" (Patch #23307).
|
|
|
|
(Pierrick Brihaye, Daniel Naber)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
6. Permit unbuffered Directory implementations (e.g., using mmap).
|
2004-09-16 17:13:37 -04:00
|
|
|
InputStream is replaced by the new classes IndexInput and
|
2004-09-28 14:15:52 -04:00
|
|
|
BufferedIndexInput. OutputStream is replaced by the new classes
|
|
|
|
IndexOutput and BufferedIndexOutput. InputStream and OutputStream
|
|
|
|
are now deprecated and FSDirectory is now subclassable. (cutting)
|
2004-08-17 16:38:46 -04:00
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
7. Add native Directory and TermDocs implementations that work under
|
2004-09-29 12:54:44 -04:00
|
|
|
GCJ. These require GCC 3.4.0 or later and have only been tested
|
|
|
|
on Linux. Use 'ant gcj' to build demo applications. (cutting)
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
8. Add MMapDirectory, which uses nio to mmap input files. This is
|
2004-09-29 12:54:44 -04:00
|
|
|
still somewhat slower than FSDirectory. However it uses less
|
|
|
|
memory per query term, since a new buffer is not allocated per
|
|
|
|
term, which may help applications which use, e.g., wildcard
|
2004-10-04 15:45:27 -04:00
|
|
|
queries. It may also someday be faster. (cutting & Paul Elschot)
|
2004-09-29 12:54:44 -04:00
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
9. Added javadocs-internal to build.xml - bug #30360
|
2004-11-19 15:39:02 -05:00
|
|
|
(Paul Elschot via Otis)
|
2004-11-23 09:17:18 -05:00
|
|
|
|
2004-11-23 16:07:49 -05:00
|
|
|
10. Added RangeFilter, a more generically useful filter than DateFilter.
|
|
|
|
(Chris M Hostetter via Erik)
|
|
|
|
|
|
|
|
11. Added NumberTools, a utility class indexing numeric fields.
|
|
|
|
(adapted from code contributed by Matt Quail; committed by Erik)
|
2004-12-30 08:04:13 -05:00
|
|
|
|
2005-02-04 12:46:21 -05:00
|
|
|
12. Added public static IndexReader.main(String[] args) method.
|
|
|
|
IndexReader can now be used directly at command line level
|
2005-01-21 13:28:35 -05:00
|
|
|
to list and optionally extract the individual files from an existing
|
2004-12-30 08:04:13 -05:00
|
|
|
compound index file.
|
|
|
|
(adapted from code contributed by Garrett Rooney; committed by Bernhard)
|
2004-11-25 11:57:11 -05:00
|
|
|
|
2005-03-09 13:58:26 -05:00
|
|
|
13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
|
|
|
|
(Doug Cutting)
|
|
|
|
|
2005-03-12 05:38:51 -05:00
|
|
|
14. Added LucenePackage, whose static get() method returns java.util.Package,
|
2005-03-11 22:23:13 -05:00
|
|
|
which lets the caller get the Lucene version information specified in
|
|
|
|
the Lucene Jar.
|
|
|
|
(Doug Cutting via Otis)
|
2005-04-25 20:21:53 -04:00
|
|
|
|
|
|
|
15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
|
|
|
|
This provides standard java.util.Iterator iteration over Hits.
|
|
|
|
Each call to the iterator's next() method returns a Hit object.
|
|
|
|
(Jeremy Rayner via Erik)
|
2005-03-09 13:58:26 -05:00
|
|
|
|
2005-05-12 13:59:41 -04:00
|
|
|
16. Add ParallelReader, an IndexReader that combines separate indexes
|
|
|
|
over different fields into a single virtual index. (Doug Cutting)
|
|
|
|
|
2005-06-02 12:48:40 -04:00
|
|
|
17. Add IntParser and FloatParser interfaces to FieldCache, so that
|
|
|
|
fields in arbitrarily formats can be cached as ints and floats.
|
|
|
|
(Doug Cutting)
|
2005-06-06 18:29:30 -04:00
|
|
|
|
|
|
|
18. Added class org.apache.lucene.index.IndexModifier which combines
|
|
|
|
IndexWriter and IndexReader, so you can add and delete documents without
|
|
|
|
worrying about synchronisation/locking issues.
|
|
|
|
(Daniel Naber)
|
2005-06-02 12:48:40 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
API Changes
|
|
|
|
|
|
|
|
1. Several methods and fields have been deprecated. The API documentation
|
|
|
|
contains information about the recommended replacements. It is planned
|
2004-11-28 10:42:17 -05:00
|
|
|
that most of the deprecated methods and fields will be removed in
|
|
|
|
Lucene 2.0. (Daniel Naber)
|
2004-11-19 15:39:02 -05:00
|
|
|
|
2005-04-27 06:26:22 -04:00
|
|
|
2. The Russian and the German analyzers have been moved to contrib/analyzers.
|
2004-11-19 15:39:02 -05:00
|
|
|
Also, the WordlistLoader class has been moved one level up in the
|
|
|
|
hierarchy and is now org.apache.lucene.analysis.WordlistLoader
|
|
|
|
(Daniel Naber)
|
|
|
|
|
|
|
|
3. The API contained methods that declared to throw an IOException
|
|
|
|
but that never did this. These declarations have been removed. If
|
|
|
|
your code tries to catch these exceptions you might need to remove
|
|
|
|
those catch clauses to avoid compile errors. (Daniel Naber)
|
|
|
|
|
|
|
|
4. Add a serializable Parameter Class to standardize parameter enum
|
|
|
|
classes in BooleanClause and Field. (Christoph)
|
|
|
|
|
|
|
|
Bug fixes
|
|
|
|
|
2004-12-06 15:04:01 -05:00
|
|
|
1. The JSP demo page (src/jsp/results.jsp) now properly closes the
|
2004-11-19 16:04:17 -05:00
|
|
|
IndexSearcher it opens. (Daniel Naber)
|
|
|
|
|
2004-12-06 15:04:01 -05:00
|
|
|
2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
|
2004-11-19 15:39:02 -05:00
|
|
|
prevented deletion of obsolete segments. (Christoph Goller)
|
2004-12-19 10:23:32 -05:00
|
|
|
|
|
|
|
3. Fix in FieldInfos to avoid the return of an extra blank field in
|
|
|
|
IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
|
2004-11-25 14:14:26 -05:00
|
|
|
|
2005-01-20 16:20:55 -05:00
|
|
|
4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
|
|
|
|
PhrasePrefixQuery) could provoke UnsupportedOperationException
|
|
|
|
(bug #33161). (Rhett Sutphin via Daniel Naber)
|
2005-01-24 15:31:56 -05:00
|
|
|
|
|
|
|
5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
|
|
|
|
if skipTo() was called without prior call to next() fixed. (Christoph)
|
2005-01-20 16:20:55 -05:00
|
|
|
|
2005-03-07 14:26:27 -05:00
|
|
|
6. Disable Similiarty.coord() in the scoring of most automatically
|
|
|
|
generated boolean queries. The coord() score factor is
|
|
|
|
appropriate when clauses are independently specified by a user,
|
|
|
|
but is usually not appropriate when clauses are generated
|
|
|
|
automatically, e.g., by a fuzzy, wildcard or range query. Matches
|
|
|
|
on such automatically generated queries are no longer penalized
|
|
|
|
for not matching all terms. (Doug Cutting, Patch #33472)
|
|
|
|
|
2005-06-01 16:10:58 -04:00
|
|
|
7. Getting a lock file with Lock.obtain(long) was supposed to wait for
|
|
|
|
a given amount of milliseconds, but this didn't work.
|
|
|
|
(John Wang via Daniel Naber, Bug #33799)
|
|
|
|
|
2005-06-02 12:57:10 -04:00
|
|
|
8. Fix FSDirectory.createOutput() to always create new files.
|
|
|
|
Previously, existing files were overwritten, and an index could be
|
|
|
|
corrupted when the old version of a file was longer than the new.
|
|
|
|
Now any existing file is first removed. (Doug Cutting)
|
|
|
|
|
2005-06-02 20:22:47 -04:00
|
|
|
9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
|
|
|
|
could return an incorrect number of hits.
|
|
|
|
(Reece Wilton via Erik Hatcher, Bug #35157)
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
Optimizations
|
|
|
|
|
|
|
|
1. Disk usage (peak requirements during indexing and optimization)
|
|
|
|
in case of compound file format has been improved.
|
|
|
|
(Bernhard, Dmitry, and Christoph)
|
|
|
|
|
|
|
|
2. Optimize the performance of certain uses of BooleanScorer,
|
2004-09-29 12:54:44 -04:00
|
|
|
TermScorer and IndexSearcher. In particular, a BooleanQuery
|
|
|
|
composed of TermQuery, with not all terms required, that returns a
|
|
|
|
TopDocs (e.g., through a Hits with no Sort specified) runs much
|
|
|
|
faster. (cutting)
|
2004-09-30 05:19:43 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
3. Removed synchronization from reading of term vectors with an
|
2004-10-06 08:15:05 -04:00
|
|
|
IndexReader (Patch #30736). (Bernhard Messer via Christoph)
|
2004-09-20 14:14:25 -04:00
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
4. Optimize term-dictionary lookup to allocate far fewer terms when
|
2004-10-08 11:58:49 -04:00
|
|
|
scanning for the matching term. This speeds searches involving
|
|
|
|
low-frequency terms, where the cost of dictionary lookup can be
|
|
|
|
significant. (cutting)
|
|
|
|
|
2004-11-19 15:39:02 -05:00
|
|
|
5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
|
2004-11-07 18:31:16 -05:00
|
|
|
of 0 now run 20-50% faster (Patch #31882).
|
|
|
|
(Jonathan Hager via Daniel Naber)
|
2005-01-24 15:31:56 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
6. A Version of BooleanScorer (BooleanScorer2) added that delivers
|
|
|
|
documents in increasing order and implements skipTo. For queries
|
|
|
|
with required or forbidden clauses it may be faster than the old
|
|
|
|
BooleanScorer, for BooleanQueries consisting only of optional
|
|
|
|
clauses it is probably slower. The new BooleanScorer is now the
|
2005-01-24 15:31:56 -05:00
|
|
|
default. (Patch 31785 by Paul Elschot via Christoph)
|
2004-11-07 18:31:16 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
7. Use uncached access to norms when merging to reduce RAM usage.
|
|
|
|
(Bug #32847). (Doug Cutting)
|
|
|
|
|
2005-03-07 15:28:04 -05:00
|
|
|
8. Don't read term index when random-access is not required. This
|
|
|
|
reduces time to open IndexReaders and they use less memory when
|
|
|
|
random access is not required, e.g., when merging segments. The
|
|
|
|
term index is now read into memory lazily at the first
|
|
|
|
random-access. (Doug Cutting)
|
|
|
|
|
2005-06-02 13:05:58 -04:00
|
|
|
9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
|
|
|
|
added indexes is larger than mergeFactor. Previously this could
|
|
|
|
result in quadratic performance. Now performance is n log(n).
|
|
|
|
(Doug Cutting)
|
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
Infrastructure
|
2005-02-02 15:29:56 -05:00
|
|
|
|
2005-02-04 14:09:53 -05:00
|
|
|
1. Lucene's source code repository has converted from CVS to
|
|
|
|
Subversion. The new repository is at
|
|
|
|
http://svn.apache.org/repos/asf/lucene/java/trunk
|
2004-11-07 18:31:16 -05:00
|
|
|
|
2005-05-04 19:26:00 -04:00
|
|
|
|
2004-12-06 15:04:01 -05:00
|
|
|
1.4.3
|
|
|
|
|
|
|
|
1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
|
|
|
|
messages which might contain user input (e.g. error messages about
|
|
|
|
query parsing). If you used that page as a starting point for your
|
|
|
|
own code please make sure your code also properly escapes HTML
|
|
|
|
characters from user input in order to avoid so-called cross site
|
|
|
|
scripting attacks. (Daniel Naber)
|
|
|
|
|
|
|
|
2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
|
|
|
|
API is supported again. (Christoph)
|
|
|
|
|
|
|
|
|
2004-11-19 15:53:24 -05:00
|
|
|
1.4.2
|
|
|
|
|
|
|
|
1. Fixed bug #31241: Sorting could lead to incorrect results (documents
|
|
|
|
missing, others duplicated) if the sort keys were not unique and there
|
|
|
|
were more than 100 matches. (Daniel Naber)
|
|
|
|
|
|
|
|
2. Memory leak in Sort code (bug #31240) eliminated.
|
|
|
|
(Rafal Krzewski via Christoph and Daniel)
|
|
|
|
|
|
|
|
3. FuzzyQuery now takes an additional parameter that specifies the
|
|
|
|
minimum similarity that is required for a term to match the query.
|
|
|
|
The QueryParser syntax for this is term~x, where x is a floating
|
|
|
|
point number >= 0 and < 1 (a bigger number means that a higher
|
|
|
|
similarity is required). Furthermore, a prefix can be specified
|
|
|
|
for FuzzyQuerys so that only those terms are considered similar that
|
|
|
|
start with this prefix. This can speed up FuzzyQuery greatly.
|
|
|
|
(Daniel Naber, Christoph Goller)
|
|
|
|
|
|
|
|
4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
|
|
|
|
of relative positions. (Christoph Goller)
|
|
|
|
|
|
|
|
5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
|
|
|
|
(patch #9110); some unused method parameters removed; The ability
|
|
|
|
to specify a minimum similarity for FuzzyQuery has been added.
|
|
|
|
(Christoph Goller)
|
|
|
|
|
|
|
|
6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
|
|
|
|
for every non-zero-scoring hit. This makes 'OR' queries that
|
|
|
|
contain common terms substantially faster. (cutting)
|
|
|
|
|
|
|
|
|
2004-08-02 16:53:14 -04:00
|
|
|
1.4.1
|
2004-07-21 15:05:46 -04:00
|
|
|
|
|
|
|
1. Fixed a performance bug in hit sorting code, where values were not
|
|
|
|
correctly cached. (Aviran via cutting)
|
|
|
|
|
2004-08-07 07:36:39 -04:00
|
|
|
2. Fixed errors in file format documentation. (Daniel Naber)
|
2004-08-02 16:53:14 -04:00
|
|
|
|
2004-07-21 15:05:46 -04:00
|
|
|
|
2004-07-01 13:40:41 -04:00
|
|
|
1.4 final
|
2004-05-20 12:38:58 -04:00
|
|
|
|
|
|
|
1. Added "an" to the list of stop words in StopAnalyzer, to complement
|
|
|
|
the existing "a" there. Fix for bug 28960
|
|
|
|
(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
|
|
|
|
|
2004-05-20 13:16:56 -04:00
|
|
|
2. Added new class FieldCache to manage in-memory caches of field term
|
|
|
|
values. (Tim Jones)
|
|
|
|
|
2004-05-22 13:34:31 -04:00
|
|
|
3. Added overloaded getFieldQuery method to QueryParser which
|
|
|
|
accepts the slop factor specified for the phrase (or the default
|
|
|
|
phrase slop for the QueryParser instance). This allows overriding
|
|
|
|
methods to replace a PhraseQuery with a SpanNearQuery instead,
|
|
|
|
keeping the proper slop factor. (Erik Hatcher)
|
|
|
|
|
2004-05-30 16:24:20 -04:00
|
|
|
4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
|
|
|
|
UTF-8 and changed the build encoding to UTF-8, to make changed files
|
|
|
|
compile. (Otis Gospodnetic)
|
|
|
|
|
2004-06-07 12:55:52 -04:00
|
|
|
5. Removed synchronization from term lookup under IndexReader methods
|
|
|
|
termFreq(), termDocs() or termPositions() to improve
|
|
|
|
multi-threaded performance. (cutting)
|
|
|
|
|
2004-06-09 07:28:46 -04:00
|
|
|
6. Fix a bug where obsolete segment files were not deleted on Win32.
|
|
|
|
|
2004-05-22 13:34:31 -04:00
|
|
|
|
2004-04-27 18:04:50 -04:00
|
|
|
1.4 RC3
|
|
|
|
|
|
|
|
1. Fixed several search bugs introduced by the skipTo() changes in
|
2004-05-11 16:12:43 -04:00
|
|
|
release 1.4RC1. The index file format was changed a bit, so
|
|
|
|
collections must be re-indexed to take advantage of the skipTo()
|
|
|
|
optimizations. (Christoph Goller)
|
2004-04-27 18:04:50 -04:00
|
|
|
|
|
|
|
2. Added new Document methods, removeField() and removeFields().
|
|
|
|
(Christoph Goller)
|
|
|
|
|
|
|
|
3. Fixed inconsistencies with index closing. Indexes and directories
|
|
|
|
are now only closed automatically by Lucene when Lucene opened
|
|
|
|
them automatically. (Christoph Goller)
|
|
|
|
|
|
|
|
4. Added new class: FilteredQuery. (Tim Jones)
|
|
|
|
|
|
|
|
5. Added a new SortField type for custom comparators. (Tim Jones)
|
|
|
|
|
2004-05-09 08:52:00 -04:00
|
|
|
6. Lock obtain timed out message now displays the full path to the lock
|
|
|
|
file. (Daniel Naber via Erik)
|
|
|
|
|
2004-05-11 16:12:43 -04:00
|
|
|
7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
|
|
|
|
|
|
|
|
8. Fixed so that FSDirectory's locks still work when the
|
|
|
|
java.io.tmpdir system property is null. (cutting)
|
|
|
|
|
2004-05-20 12:28:15 -04:00
|
|
|
9. Changed FilteredTermEnum's constructor to take no parameters,
|
|
|
|
as the parameters were ignored anyway (bug #28858)
|
2004-04-27 18:04:50 -04:00
|
|
|
|
2004-03-30 12:40:08 -05:00
|
|
|
1.4 RC2
|
|
|
|
|
2004-04-22 04:30:44 -04:00
|
|
|
1. GermanAnalyzer now throws an exception if the stopword file
|
|
|
|
cannot be found (bug #27987). It now uses LowerCaseFilter
|
|
|
|
(bug #18410) (Daniel Naber via Otis, Erik)
|
2004-03-30 12:40:08 -05:00
|
|
|
|
2004-03-30 12:48:45 -05:00
|
|
|
2. Fixed a few bugs in the file format documentation. (cutting)
|
|
|
|
|
2004-03-30 12:40:08 -05:00
|
|
|
|
2004-01-15 17:42:35 -05:00
|
|
|
1.4 RC1
|
|
|
|
|
|
|
|
1. Changed the format of the .tis file, so that:
|
|
|
|
|
|
|
|
- it has a format version number, which makes it easier to
|
|
|
|
back-compatibly change file formats in the future.
|
|
|
|
|
|
|
|
- the term count is now stored as a long. This was the one aspect
|
|
|
|
of the Lucene's file formats which limited index size.
|
|
|
|
|
|
|
|
- a few internal index parameters are now stored in the index, so
|
|
|
|
that they can (in theory) now be changed from index to index,
|
|
|
|
although there is not yet an API to do so.
|
|
|
|
|
|
|
|
These changes are back compatible. The new code can read old
|
|
|
|
indexes. But old code will not be able read new indexes. (cutting)
|
|
|
|
|
|
|
|
2. Added an optimized implementation of TermDocs.skipTo(). A skip
|
|
|
|
table is now stored for each term in the .frq file. This only
|
|
|
|
adds a percent or two to overall index size, but can substantially
|
|
|
|
speedup many searches. (cutting)
|
|
|
|
|
|
|
|
3. Restructured the Scorer API and all Scorer implementations to take
|
|
|
|
advantage of an optimized TermDocs.skipTo() implementation. In
|
|
|
|
particular, PhraseQuerys and conjunctive BooleanQuerys are
|
|
|
|
faster when one clause has substantially fewer matches than the
|
|
|
|
others. (A conjunctive BooleanQuery is a BooleanQuery where all
|
|
|
|
clauses are required.) (cutting)
|
|
|
|
|
2004-01-20 13:37:09 -05:00
|
|
|
4. Added new class ParallelMultiSearcher. Combined with
|
|
|
|
RemoteSearchable this makes it easy to implement distributed
|
|
|
|
search systems. (Jean-Francois Halleux via cutting)
|
|
|
|
|
2004-02-17 14:00:31 -05:00
|
|
|
5. Added support for hit sorting. Results may now be sorted by any
|
|
|
|
indexed field. For details see the javadoc for
|
|
|
|
Searcher#search(Query, Sort). (Tim Jones via Cutting)
|
2004-01-30 12:07:53 -05:00
|
|
|
|
|
|
|
6. Changed FSDirectory to auto-create a full directory tree that it
|
|
|
|
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
|
|
|
|
|
2004-01-30 17:10:00 -05:00
|
|
|
7. Added a new span-based query API. This implements, among other
|
|
|
|
things, nested phrases. See javadocs for details. (Doug Cutting)
|
|
|
|
|
2004-02-06 14:19:20 -05:00
|
|
|
8. Added new method Query.getSimilarity(Searcher), and changed
|
|
|
|
scorers to use it. This permits one to subclass a Query class so
|
|
|
|
that it can specify it's own Similarity implementation, perhaps
|
2004-02-06 15:56:45 -05:00
|
|
|
one that delegates through that of the Searcher. (Julien Nioche
|
|
|
|
via Cutting)
|
2004-02-06 14:19:20 -05:00
|
|
|
|
2004-02-19 13:28:59 -05:00
|
|
|
9. Added MultiReader, an IndexReader that combines multiple other
|
|
|
|
IndexReaders. (Cutting)
|
2004-01-15 17:42:35 -05:00
|
|
|
|
2004-02-20 15:14:56 -05:00
|
|
|
10. Added support for term vectors. See Field#isTermVectorStored().
|
|
|
|
(Grant Ingersoll, Cutting & Dmitry)
|
|
|
|
|
2004-03-03 06:24:49 -05:00
|
|
|
11. Fixed the old bug with escaping of special characters in query
|
|
|
|
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
|
|
|
|
(Jean-Francois Halleux via Otis)
|
|
|
|
|
2004-03-18 14:05:18 -05:00
|
|
|
12. Added support for overriding default values for the following,
|
|
|
|
using system properties:
|
|
|
|
- default commit lock timeout
|
|
|
|
- default maxFieldLength
|
|
|
|
- default maxMergeDocs
|
|
|
|
- default mergeFactor
|
|
|
|
- default minMergeDocs
|
|
|
|
- default write lock timeout
|
|
|
|
(Otis)
|
|
|
|
|
2004-03-24 05:12:27 -05:00
|
|
|
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
|
|
|
|
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
|
|
|
|
(Morus Walter via Otis)
|
|
|
|
|
2004-03-24 13:10:59 -05:00
|
|
|
14. Changed so that the compound index format is used by default.
|
|
|
|
This makes indexing a bit slower, but vastly reduces the chances
|
|
|
|
of file handle problems. (Cutting)
|
|
|
|
|
2004-02-20 15:14:56 -05:00
|
|
|
|
2003-12-26 13:05:27 -05:00
|
|
|
1.3 final
|
2003-11-26 06:10:54 -05:00
|
|
|
|
|
|
|
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
|
|
|
|
throw ParseException instead. (Erik Hatcher)
|
|
|
|
|
2003-12-15 18:04:42 -05:00
|
|
|
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
|
|
|
|
|
|
|
|
3. Added a new method IndexReader.setNorm(), that permits one to
|
|
|
|
alter the boosting of fields after an index is created.
|
|
|
|
|
2003-12-22 16:42:48 -05:00
|
|
|
4. Distinguish between the final position and length when indexing a
|
|
|
|
field. The length is now defined as the total number of tokens,
|
|
|
|
instead of the final position, as it was previously. Length is
|
|
|
|
used for score normalization (Similarity.lengthNorm()) and for
|
|
|
|
controlling memory usage (IndexWriter.maxFieldLength). In both of
|
|
|
|
these cases, the total number of tokens is a better value to use
|
|
|
|
than the final token position. Position is used in phrase
|
|
|
|
searching (see PhraseQuery and Token.setPositionIncrement()).
|
|
|
|
|
2003-12-22 17:12:24 -05:00
|
|
|
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
|
|
|
|
Japanese and Korean ideograms). Previously contiguous sequences
|
|
|
|
were combined in a single token, which is not very useful. Now
|
|
|
|
each ideogram generates a separate token, which is more useful.
|
|
|
|
|
2003-12-15 18:04:42 -05:00
|
|
|
|
2003-11-18 07:00:12 -05:00
|
|
|
1.3 RC3
|
|
|
|
|
2003-11-25 16:56:08 -05:00
|
|
|
1. Added minMergeDocs in IndexWriter. This can be raised to speed
|
|
|
|
indexing without altering the number of files, but only using more
|
|
|
|
memory. (Julien Nioche via Otis)
|
|
|
|
|
|
|
|
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
|
|
|
|
|
|
|
|
3. Fix bug #16952, in demo HTML parser, skip comments in
|
|
|
|
javascript. (Christoph Goller)
|
|
|
|
|
|
|
|
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
|
|
|
|
output (Daniel Naber via Christoph Goller)
|
|
|
|
|
|
|
|
5. Fix bug #24301, in demo HTML parser, long titles no longer
|
|
|
|
hang things. (Christoph Goller)
|
|
|
|
|
|
|
|
6. Fix bug #23534, Replace use of file timestamp of segments file
|
|
|
|
with an index version number stored in the segments file. This
|
|
|
|
resolves problems when running on file systems with low-resolution
|
|
|
|
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
|
|
|
|
|
|
|
|
7. Fix QueryParser so that TokenMgrError is not thrown, only
|
|
|
|
ParseException. (Erik Hatcher)
|
|
|
|
|
|
|
|
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
|
|
|
|
|
|
|
|
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
|
|
|
|
|
|
|
|
10. Cleaned up some build stuff. (Erik Hatcher)
|
2003-11-18 07:00:12 -05:00
|
|
|
|
|
|
|
|
2003-03-20 13:38:59 -05:00
|
|
|
1.3 RC2
|
|
|
|
|
2003-04-30 21:09:15 -04:00
|
|
|
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
|
2003-09-18 05:47:06 -04:00
|
|
|
SegmentsReader. (Julien Nioche via otis)
|
2003-03-20 13:38:59 -05:00
|
|
|
|
2003-05-01 15:50:18 -04:00
|
|
|
2. Changed file locking to place lock files in
|
|
|
|
System.getProperty("java.io.tmpdir"), where all users are
|
|
|
|
permitted to write files. This way folks can open and correctly
|
|
|
|
lock indexes which are read-only to them.
|
|
|
|
|
2003-07-11 18:13:13 -04:00
|
|
|
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
|
|
|
|
permitting one to easily use different analyzers for different
|
|
|
|
documents in the same index.
|
|
|
|
|
2003-09-11 07:50:36 -04:00
|
|
|
4. Minor enhancements to FuzzyTermEnum.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
|
|
|
|
and MultiIndexSearcher to use it.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
6. Fixed a bug in IndexWriter that returned incorrect docCount().
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
|
|
|
7. Fixed SegmentsReader to eliminate the confusing and slightly different
|
|
|
|
behaviour of TermEnum when dealing with an enumeration of all terms,
|
|
|
|
versus an enumeration starting from a specific term.
|
|
|
|
This patch also fixes incorrect term document frequences when the same term
|
|
|
|
is present in multiple segments.
|
|
|
|
(Christoph Goller via Otis)
|
|
|
|
|
2003-10-03 08:37:51 -04:00
|
|
|
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
|
|
|
|
|
|
|
|
9. Added support for the new "compound file" index format (Dmitry
|
|
|
|
Serebrennikov)
|
|
|
|
|
2003-10-03 11:16:24 -04:00
|
|
|
10. Added Locale setting to QueryParser, for use by date range parsing.
|
2003-09-18 05:47:06 -04:00
|
|
|
|
2003-10-21 13:59:17 -04:00
|
|
|
11. Changed IndexReader so that it can be subclassed by classes
|
|
|
|
outside of its package. Previously it had package-private
|
|
|
|
abstract methods. Also modified the index merging code so that it
|
|
|
|
can work on an arbitrary IndexReader implementation, and added a
|
|
|
|
new method, IndexWriter.addIndexes(IndexReader[]), to take
|
|
|
|
advantage of this. (cutting)
|
|
|
|
|
|
|
|
12. Added a limit to the number of clauses which may be added to a
|
|
|
|
BooleanQuery. The default limit is 1024 clauses. This should
|
|
|
|
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
|
|
|
|
queries which run amok. (cutting)
|
|
|
|
|
2003-10-21 14:24:23 -04:00
|
|
|
13. Add new method: IndexReader.undeleteAll(). This undeletes all
|
|
|
|
deleted documents which still remain in the index. (cutting)
|
|
|
|
|
2003-03-20 13:38:59 -05:00
|
|
|
|
2003-03-20 13:15:04 -05:00
|
|
|
1.3 RC1
|
2002-06-02 15:15:18 -04:00
|
|
|
|
2002-06-05 14:42:46 -04:00
|
|
|
1. Fixed PriorityQueue's clear() method.
|
|
|
|
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
|
|
|
|
(Matthijs Bomhoff via otis)
|
|
|
|
|
|
|
|
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
|
|
|
|
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
|
|
|
|
(Dale Anson via otis)
|
2002-05-14 11:24:05 -04:00
|
|
|
|
2002-06-27 12:30:20 -04:00
|
|
|
3. Added the ability to disable lock creation by using disableLuceneLocks
|
|
|
|
system property. This is useful for read-only media, such as CD-ROMs.
|
2002-06-21 12:41:38 -04:00
|
|
|
(otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-06-21 12:54:00 -04:00
|
|
|
4. Added id method to Hits to be able to access the index global id.
|
2002-06-29 18:10:37 -04:00
|
|
|
Required for sorting options.
|
|
|
|
(carlson)
|
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
5. Added support for new range query syntax to QueryParser.jj.
|
|
|
|
(briangoetz)
|
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
6. Added the ability to retrieve HTML documents' META tag values to
|
|
|
|
HTMLParser.jj.
|
2002-06-29 18:10:37 -04:00
|
|
|
(Mark Harwood via otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-07-14 23:58:06 -04:00
|
|
|
7. Modified QueryParser to make it possible to programmatically specify the
|
|
|
|
default Boolean operator (OR or AND).
|
2004-05-24 15:05:21 -04:00
|
|
|
(Péter Halácsy via otis)
|
2002-07-14 23:58:06 -04:00
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
8. Made many search methods and classes non-final, per requests.
|
|
|
|
This includes IndexWriter and IndexSearcher, among others.
|
|
|
|
(cutting)
|
2002-07-25 02:11:35 -04:00
|
|
|
|
2002-07-17 19:26:26 -04:00
|
|
|
9. Added class RemoteSearchable, providing support for remote
|
|
|
|
searching via RMI. The test class RemoteSearchableTest.java
|
|
|
|
provides an example of how this can be used. (cutting)
|
|
|
|
|
2002-07-18 10:40:51 -04:00
|
|
|
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
|
|
|
|
test class TestPhrasePrefixQuery provides the usage example.
|
|
|
|
(Anders Nielsen via otis)
|
2002-07-17 19:26:26 -04:00
|
|
|
|
2002-07-26 13:32:54 -04:00
|
|
|
11. Changed the German stemming algorithm to ignore case while
|
|
|
|
stripping. The new algorithm is faster and produces more equal
|
|
|
|
stems from nouns and verbs derived from the same word.
|
2002-07-25 02:11:35 -04:00
|
|
|
(gschwarz)
|
|
|
|
|
2002-07-29 15:11:15 -04:00
|
|
|
12. Added support for boosting the score of documents and fields via
|
|
|
|
the new methods Document.setBoost(float) and Field.setBoost(float).
|
|
|
|
|
|
|
|
Note: This changes the encoding of an indexed value. Indexes
|
|
|
|
should be re-created from scratch in order for search scores to
|
|
|
|
be correct. With the new code and an old index, searches will
|
|
|
|
yield very large scores for shorter fields, and very small scores
|
|
|
|
for longer fields. Once the index is re-created, scores will be
|
|
|
|
as before. (cutting)
|
|
|
|
|
2002-08-05 13:39:03 -04:00
|
|
|
13. Added new method Token.setPositionIncrement().
|
|
|
|
|
|
|
|
This permits, for the purpose of phrase searching, placing
|
|
|
|
multiple terms in a single position. This is useful with
|
|
|
|
stemmers that produce multiple possible stems for a word.
|
|
|
|
|
|
|
|
This also permits the introduction of gaps between terms, so that
|
|
|
|
terms which are adjacent in a token stream will not be matched by
|
|
|
|
and exact phrase query. This makes it possible, e.g., to build
|
|
|
|
an analyzer where phrases are not matched over stop words which
|
|
|
|
have been removed.
|
|
|
|
|
|
|
|
Finally, repeating a token with an increment of zero can also be
|
2002-08-05 14:05:56 -04:00
|
|
|
used to boost scores of matches on that token. (cutting)
|
|
|
|
|
|
|
|
14. Added new Filter class, QueryFilter. This constrains search
|
|
|
|
results to only match those which also match a provided query.
|
|
|
|
Results are cached, so that searches after the first on the same
|
|
|
|
index using this filter are very fast.
|
|
|
|
|
|
|
|
This could be used, for example, with a RangeQuery on a formatted
|
|
|
|
date field to implement date filtering. One could re-use a
|
|
|
|
single QueryFilter that matches, e.g., only documents modified
|
|
|
|
within the last week. The QueryFilter and RangeQuery would only
|
|
|
|
need to be reconstructed once per day. (cutting)
|
2002-08-05 13:39:03 -04:00
|
|
|
|
2002-08-07 12:28:08 -04:00
|
|
|
15. Added a new IndexWriter method, getAnalyzer(). This returns the
|
2002-08-08 13:56:19 -04:00
|
|
|
analyzer used when adding documents to this index. (cutting)
|
|
|
|
|
|
|
|
16. Fixed a bug with IndexReader.lastModified(). Before, document
|
|
|
|
deletion did not update this. Now it does. (cutting)
|
2002-08-07 12:28:08 -04:00
|
|
|
|
2002-09-16 00:11:36 -04:00
|
|
|
17. Added Russian Analyzer.
|
|
|
|
(Boris Okner via otis)
|
|
|
|
|
2002-11-07 12:31:27 -05:00
|
|
|
18. Added a public, extensible scoring API. For details, see the
|
|
|
|
javadoc for org.apache.lucene.search.Similarity.
|
2003-01-13 22:41:05 -05:00
|
|
|
|
2002-11-15 11:09:48 -05:00
|
|
|
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
|
2002-11-07 12:31:27 -05:00
|
|
|
|
2003-01-04 12:29:40 -05:00
|
|
|
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
|
2003-01-04 12:16:07 -05:00
|
|
|
(Peter Mularien via otis)
|
2002-07-29 15:11:15 -04:00
|
|
|
|
2003-01-07 11:11:00 -05:00
|
|
|
21. Added getFields(String) and getValues(String) methods.
|
2003-01-13 22:41:05 -05:00
|
|
|
Contributed by Rasik Pandey on 2002-10-09
|
2003-01-07 11:11:00 -05:00
|
|
|
(Rasik Pandey via otis)
|
|
|
|
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
22. Revised internal search APIs. Changes include:
|
|
|
|
|
|
|
|
a. Queries are no longer modified during a search. This makes
|
|
|
|
it possible, e.g., to reuse the same query instance with
|
|
|
|
multiple indexes from multiple threads.
|
|
|
|
|
|
|
|
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
|
|
|
|
etc.) now work correctly with MultiSearcher, fixing bugs 12619
|
|
|
|
and 12667.
|
|
|
|
|
|
|
|
c. Boosting BooleanQuery's now works, and is supported by the
|
|
|
|
query parser (problem reported by Lee Mallabone). Thus a query
|
|
|
|
like "(+foo +bar)^2 +baz" is now supported and equivalent to
|
|
|
|
"(+foo^2 +bar^2) +baz".
|
|
|
|
|
|
|
|
d. New method: Query.rewrite(IndexReader). This permits a
|
|
|
|
query to re-write itself as an alternate, more primitive query.
|
|
|
|
Most of the term-expanding query classes (PrefixQuery,
|
|
|
|
WildcardQuery, etc.) are now implemented using this method.
|
|
|
|
|
|
|
|
e. New method: Searchable.explain(Query q, int doc). This
|
|
|
|
returns an Explanation instance that describes how a particular
|
|
|
|
document is scored against a query. An explanation can be
|
|
|
|
displayed as either plain text, with the toString() method, or
|
|
|
|
as HTML, with the toHtml() method. Note that computing an
|
|
|
|
explanation is as expensive as executing the query over the
|
|
|
|
entire index. This is intended to be used in developing
|
|
|
|
Similarity implementations, and, for good performance, should
|
|
|
|
not be displayed with every hit.
|
|
|
|
|
|
|
|
f. Scorer and Weight are public, not package protected. It now
|
|
|
|
possible for someone to write a Scorer implementation that is
|
|
|
|
not in the org.apache.lucene.search package. This is still
|
|
|
|
fairly advanced programming, and I don't expect anyone to do
|
|
|
|
this anytime soon, but at least now it is possible.
|
|
|
|
|
2003-01-14 14:20:30 -05:00
|
|
|
g. Added public accessors to the primitive query classes
|
|
|
|
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
|
|
|
|
their terms and clauses.
|
|
|
|
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
Caution: These are extensive changes and they have not yet been
|
|
|
|
tested extensively. Bug reports are appreciated.
|
2003-01-13 22:41:05 -05:00
|
|
|
(cutting)
|
|
|
|
|
|
|
|
23. Added convenience RAMDirectory constructors taking File and String
|
|
|
|
arguments, for easy FSDirectory to RAMDirectory conversion.
|
2003-01-14 00:03:19 -05:00
|
|
|
(otis)
|
Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149922 13f79535-47bb-0310-9956-ffa450edef68
2003-01-13 18:50:34 -05:00
|
|
|
|
2003-03-01 20:22:36 -05:00
|
|
|
24. Added code for manual renaming of files in FSDirectory, since it
|
|
|
|
has been reported that java.io.File's renameTo(File) method sometimes
|
|
|
|
fails on Windows JVMs.
|
|
|
|
(Matt Tucker via otis)
|
|
|
|
|
2003-03-01 21:48:45 -05:00
|
|
|
25. Refactored QueryParser to make it easier for people to extend it.
|
|
|
|
Added the ability to automatically lower-case Wildcard terms in
|
|
|
|
the QueryParser.
|
|
|
|
(Tatu Saloranta via otis)
|
|
|
|
|
2003-01-07 11:11:00 -05:00
|
|
|
|
2002-05-14 11:24:05 -04:00
|
|
|
1.2 RC6
|
|
|
|
|
2002-05-20 11:45:43 -04:00
|
|
|
1. Changed QueryParser.jj to have "?" be a special character which
|
2002-06-29 18:34:09 -04:00
|
|
|
allowed it to be used as a wildcard term. Updated TestWildcard
|
2002-05-20 11:45:43 -04:00
|
|
|
unit test also. (Ralf Hettesheimer via carlson)
|
2002-05-14 11:24:05 -04:00
|
|
|
|
2002-02-14 16:17:17 -05:00
|
|
|
1.2 RC5
|
|
|
|
|
2002-02-27 17:18:28 -05:00
|
|
|
1. Renamed build.properties to default.properties and updated
|
|
|
|
the BUILD.txt document to describe how to override the
|
|
|
|
default.property settings without having to edit the file. This
|
|
|
|
brings the build process closer to Scarab's build process.
|
|
|
|
(jon)
|
2002-02-14 16:17:17 -05:00
|
|
|
|
2002-02-27 18:05:15 -05:00
|
|
|
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
|
|
|
|
|
|
|
|
3. Updated "powered by" links. (otis)
|
|
|
|
|
2002-05-04 14:26:27 -04:00
|
|
|
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
|
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
5. Added throwing exception if FSDirectory could not create diectory
|
2002-05-06 14:10:55 -04:00
|
|
|
- Bug #6914 (Eugene Gluzberg via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
|
2002-05-06 14:10:55 -04:00
|
|
|
LowerCaseTokenizer javadoc (otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
7. Added fix to avoid NullPointerException in results.jsp
|
|
|
|
(Mark Hayes via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
8. Changed Wildcard search to find 0 or more char instead of 1 or more
|
|
|
|
(Lee Mallobone, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
|
|
|
|
(Rodrigo Reyes, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
|
|
|
10. Added unit tests for wildcard search and DateFilter (otis)
|
|
|
|
|
2002-05-06 14:10:55 -04:00
|
|
|
11. Allow co-existence of indexed and non-indexed fields with the same name
|
|
|
|
(cutting/casper, via otis)
|
2002-05-04 14:26:27 -04:00
|
|
|
|
2002-06-29 18:34:09 -04:00
|
|
|
12. Add escape character to query parser.
|
2002-05-06 20:24:22 -04:00
|
|
|
(briangoetz)
|
|
|
|
|
2002-05-07 17:28:51 -04:00
|
|
|
13. Applied a patch that ensures that searches that use DateFilter
|
|
|
|
don't throw an exception when no matches are found. (David Smiley, via
|
|
|
|
otis)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
2002-05-11 09:32:30 -04:00
|
|
|
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
|
2002-06-29 18:34:09 -04:00
|
|
|
|
|
|
|
|
2002-02-08 18:01:15 -05:00
|
|
|
1.2 RC4
|
|
|
|
|
2002-02-14 14:31:25 -05:00
|
|
|
1. Updated contributions section of website.
|
|
|
|
Add XML Document #3 implementation to Document Section.
|
|
|
|
Also added Term Highlighting to Misc Section. (carlson)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
|
|
|
2. Fixed NullPointerException for phrase searches containing
|
2002-02-14 14:31:25 -05:00
|
|
|
unindexed terms, introduced in 1.2RC3. (cutting)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
|
|
|
3. Changed document deletion code to obtain the index write lock,
|
|
|
|
enforcing the fact that document addition and deletion cannot be
|
2002-02-14 14:31:25 -05:00
|
|
|
performed concurrently. (cutting)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
2002-02-14 14:31:25 -05:00
|
|
|
4. Various documentation cleanups. (otis, acoliver)
|
|
|
|
|
|
|
|
5. Updated "powered by" links. (cutting, jon)
|
|
|
|
|
|
|
|
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
|
|
|
|
|
|
|
|
7. Changed Term and Query to implement Serializable. (scottganyo)
|
|
|
|
|
|
|
|
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
|
|
|
|
(cutting)
|
|
|
|
|
|
|
|
9. Upgraded to JUnit 3.7. (otis)
|
2002-02-08 18:01:15 -05:00
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
1.2 RC3
|
|
|
|
|
|
|
|
1. IndexWriter: fixed a bug where adding an optimized index to an
|
|
|
|
empty index failed. This was encountered using addIndexes to copy
|
|
|
|
a RAMDirectory index to an FSDirectory.
|
|
|
|
|
|
|
|
2. RAMDirectory: fixed a bug where RAMInputStream could not read
|
|
|
|
across more than across a single buffer boundary.
|
|
|
|
|
|
|
|
3. Fix query parser so it accepts queries with unicode characters.
|
2002-01-27 17:27:48 -05:00
|
|
|
(briangoetz)
|
2002-02-27 18:05:15 -05:00
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
4. Fix query parser so that PrefixQuery is used in preference to
|
|
|
|
WildcardQuery when there's only an asterisk at the end of the
|
|
|
|
term. Previously PrefixQuery would never be used.
|
|
|
|
|
|
|
|
5. Fix tests so they compile; fix ant file so it compiles tests
|
|
|
|
properly. Added test cases for Analyzers and PriorityQueue.
|
|
|
|
|
2002-01-27 17:27:48 -05:00
|
|
|
6. Updated demos, added Getting Started documentation. (acoliver)
|
|
|
|
|
|
|
|
7. Added 'contributions' section to website & docs. (carlson)
|
|
|
|
|
|
|
|
8. Removed JavaCC from source distribution for copyright reasons.
|
|
|
|
Folks must now download this separately from metamata in order to
|
|
|
|
compile Lucene. (cutting)
|
|
|
|
|
|
|
|
9. Substantially improved the performance of DateFilter by adding the
|
|
|
|
ability to reuse TermDocs objects. (cutting)
|
|
|
|
|
|
|
|
10. Added IndexReader methods:
|
|
|
|
public static boolean indexExists(String directory);
|
|
|
|
public static boolean indexExists(File directory);
|
|
|
|
public static boolean indexExists(Directory directory);
|
|
|
|
public static boolean isLocked(Directory directory);
|
|
|
|
public static void unlock(Directory directory);
|
|
|
|
(cutting, otis)
|
|
|
|
|
|
|
|
11. Fixed bugs in GermanAnalyzer (gschwarz)
|
|
|
|
|
2001-11-04 12:23:04 -05:00
|
|
|
|
|
|
|
1.2 RC2, 19 October 2001:
|
|
|
|
- added sources to distribution
|
|
|
|
- removed broken build scripts and libraries from distribution
|
|
|
|
- SegmentsReader: fixed potential race condition
|
|
|
|
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
|
|
|
|
erases the directory contents, even when the directory
|
|
|
|
has already been accessed in this JVM.
|
|
|
|
- RangeQuery: Fix issue where an inclusive range query would
|
|
|
|
include the nearest term in the index above a non-existant
|
|
|
|
specified upper term.
|
|
|
|
- SegmentTermEnum: Fix NullPointerException in clone() method
|
|
|
|
when the Term is null.
|
|
|
|
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
|
|
|
|
since they rely on a feature added in JDK 1.2.
|
|
|
|
|
|
|
|
1.2 RC1 (first Apache release), 2 October 2001:
|
|
|
|
- packages renamed from com.lucene to org.apache.lucene
|
|
|
|
- license switched from LGPL to Apache
|
|
|
|
- ant-only build -- no more makefiles
|
|
|
|
- addition of lock files--now fully thread & process safe
|
|
|
|
- addition of German stemmer
|
|
|
|
- MultiSearcher now supports low-level search API
|
|
|
|
- added RangeQuery, for term-range searching
|
|
|
|
- Analyzers can choose tokenizer based on field name
|
|
|
|
- misc bug fixes.
|
|
|
|
|
|
|
|
1.01b (last Sourceforge release), 2 July 2001
|
|
|
|
. a few bug fixes
|
|
|
|
. new Query Parser
|
|
|
|
. new prefix query (search for "foo*" matches "food")
|
|
|
|
|
|
|
|
1.0, 2000-10-04
|
|
|
|
|
|
|
|
This release fixes a few serious bugs and also includes some
|
|
|
|
performance optimizations, a stemmer, and a few other minor
|
|
|
|
enhancements.
|
|
|
|
|
|
|
|
0.04 2000-04-19
|
|
|
|
|
|
|
|
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
|
|
|
|
|
|
|
|
The only tokenizer included in the previous release (LetterTokenizer)
|
|
|
|
identified terms consisting entirely of alphabetic characters. The
|
|
|
|
new tokenizer uses a regular-expression grammar to identify more
|
|
|
|
complex classes of terms, including numbers, acronyms, email
|
|
|
|
addresses, etc.
|
|
|
|
|
|
|
|
StandardTokenizer serves two purposes:
|
|
|
|
|
|
|
|
1. It is a much better, general purpose tokenizer for use by
|
|
|
|
applications as is.
|
|
|
|
|
|
|
|
The easiest way for applications to start using
|
|
|
|
StandardTokenizer is to use StandardAnalyzer.
|
|
|
|
|
|
|
|
2. It provides a good example of grammar-based tokenization.
|
|
|
|
|
|
|
|
If an application has special tokenization requirements, it can
|
|
|
|
implement a custom tokenizer by copying the directory containing
|
|
|
|
the new tokenizer into the application and modifying it
|
|
|
|
accordingly.
|
|
|
|
|
|
|
|
0.01, 2000-03-30
|
|
|
|
|
|
|
|
First open source release.
|
|
|
|
|
|
|
|
The code has been re-organized into a new package and directory
|
|
|
|
structure for this release. It builds OK, but has not been tested
|
|
|
|
beyond that since the re-organization.
|