mirror of https://github.com/apache/lucene.git
542 lines
20 KiB
Plaintext
542 lines
20 KiB
Plaintext
Lucene Change Log
|
||
|
||
$Id$
|
||
|
||
1.4 RC1
|
||
|
||
1. Changed the format of the .tis file, so that:
|
||
|
||
- it has a format version number, which makes it easier to
|
||
back-compatibly change file formats in the future.
|
||
|
||
- the term count is now stored as a long. This was the one aspect
|
||
of the Lucene's file formats which limited index size.
|
||
|
||
- a few internal index parameters are now stored in the index, so
|
||
that they can (in theory) now be changed from index to index,
|
||
although there is not yet an API to do so.
|
||
|
||
These changes are back compatible. The new code can read old
|
||
indexes. But old code will not be able read new indexes. (cutting)
|
||
|
||
2. Added an optimized implementation of TermDocs.skipTo(). A skip
|
||
table is now stored for each term in the .frq file. This only
|
||
adds a percent or two to overall index size, but can substantially
|
||
speedup many searches. (cutting)
|
||
|
||
3. Restructured the Scorer API and all Scorer implementations to take
|
||
advantage of an optimized TermDocs.skipTo() implementation. In
|
||
particular, PhraseQuerys and conjunctive BooleanQuerys are
|
||
faster when one clause has substantially fewer matches than the
|
||
others. (A conjunctive BooleanQuery is a BooleanQuery where all
|
||
clauses are required.) (cutting)
|
||
|
||
4. Added new class ParallelMultiSearcher. Combined with
|
||
RemoteSearchable this makes it easy to implement distributed
|
||
search systems. (Jean-Francois Halleux via cutting)
|
||
|
||
5. Added support for hit sorting. Results may now be sorted by any
|
||
indexed field. For details see the javadoc for
|
||
Searcher#search(Query, Sort). (Tim Jones via Cutting)
|
||
|
||
6. Changed FSDirectory to auto-create a full directory tree that it
|
||
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
|
||
|
||
7. Added a new span-based query API. This implements, among other
|
||
things, nested phrases. See javadocs for details. (Doug Cutting)
|
||
|
||
8. Added new method Query.getSimilarity(Searcher), and changed
|
||
scorers to use it. This permits one to subclass a Query class so
|
||
that it can specify it's own Similarity implementation, perhaps
|
||
one that delegates through that of the Searcher. (Julien Nioche
|
||
via Cutting)
|
||
|
||
9. Added MultiReader, an IndexReader that combines multiple other
|
||
IndexReaders. (Cutting)
|
||
|
||
10. Added support for term vectors. See Field#isTermVectorStored().
|
||
(Grant Ingersoll, Cutting & Dmitry)
|
||
|
||
11. Fixed the old bug with escaping of special characters in query
|
||
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
|
||
(Jean-Francois Halleux via Otis)
|
||
|
||
12. Added support for overriding default values for the following,
|
||
using system properties:
|
||
- default commit lock timeout
|
||
- default maxFieldLength
|
||
- default maxMergeDocs
|
||
- default mergeFactor
|
||
- default minMergeDocs
|
||
- default write lock timeout
|
||
(Otis)
|
||
|
||
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
|
||
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
|
||
(Morus Walter via Otis)
|
||
|
||
14. Changed so that the compound index format is used by default.
|
||
This makes indexing a bit slower, but vastly reduces the chances
|
||
of file handle problems. (Cutting)
|
||
|
||
|
||
1.3 final
|
||
|
||
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
|
||
throw ParseException instead. (Erik Hatcher)
|
||
|
||
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
|
||
|
||
3. Added a new method IndexReader.setNorm(), that permits one to
|
||
alter the boosting of fields after an index is created.
|
||
|
||
4. Distinguish between the final position and length when indexing a
|
||
field. The length is now defined as the total number of tokens,
|
||
instead of the final position, as it was previously. Length is
|
||
used for score normalization (Similarity.lengthNorm()) and for
|
||
controlling memory usage (IndexWriter.maxFieldLength). In both of
|
||
these cases, the total number of tokens is a better value to use
|
||
than the final token position. Position is used in phrase
|
||
searching (see PhraseQuery and Token.setPositionIncrement()).
|
||
|
||
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
|
||
Japanese and Korean ideograms). Previously contiguous sequences
|
||
were combined in a single token, which is not very useful. Now
|
||
each ideogram generates a separate token, which is more useful.
|
||
|
||
|
||
1.3 RC3
|
||
|
||
1. Added minMergeDocs in IndexWriter. This can be raised to speed
|
||
indexing without altering the number of files, but only using more
|
||
memory. (Julien Nioche via Otis)
|
||
|
||
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
|
||
|
||
3. Fix bug #16952, in demo HTML parser, skip comments in
|
||
javascript. (Christoph Goller)
|
||
|
||
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
|
||
output (Daniel Naber via Christoph Goller)
|
||
|
||
5. Fix bug #24301, in demo HTML parser, long titles no longer
|
||
hang things. (Christoph Goller)
|
||
|
||
6. Fix bug #23534, Replace use of file timestamp of segments file
|
||
with an index version number stored in the segments file. This
|
||
resolves problems when running on file systems with low-resolution
|
||
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
|
||
|
||
7. Fix QueryParser so that TokenMgrError is not thrown, only
|
||
ParseException. (Erik Hatcher)
|
||
|
||
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
|
||
|
||
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
|
||
|
||
10. Cleaned up some build stuff. (Erik Hatcher)
|
||
|
||
|
||
1.3 RC2
|
||
|
||
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
|
||
SegmentsReader. (Julien Nioche via otis)
|
||
|
||
2. Changed file locking to place lock files in
|
||
System.getProperty("java.io.tmpdir"), where all users are
|
||
permitted to write files. This way folks can open and correctly
|
||
lock indexes which are read-only to them.
|
||
|
||
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
|
||
permitting one to easily use different analyzers for different
|
||
documents in the same index.
|
||
|
||
4. Minor enhancements to FuzzyTermEnum.
|
||
(Christoph Goller via Otis)
|
||
|
||
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
|
||
and MultiIndexSearcher to use it.
|
||
(Christoph Goller via Otis)
|
||
|
||
6. Fixed a bug in IndexWriter that returned incorrect docCount().
|
||
(Christoph Goller via Otis)
|
||
|
||
7. Fixed SegmentsReader to eliminate the confusing and slightly different
|
||
behaviour of TermEnum when dealing with an enumeration of all terms,
|
||
versus an enumeration starting from a specific term.
|
||
This patch also fixes incorrect term document frequences when the same term
|
||
is present in multiple segments.
|
||
(Christoph Goller via Otis)
|
||
|
||
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
|
||
|
||
9. Added support for the new "compound file" index format (Dmitry
|
||
Serebrennikov)
|
||
|
||
10. Added Locale setting to QueryParser, for use by date range parsing.
|
||
|
||
11. Changed IndexReader so that it can be subclassed by classes
|
||
outside of its package. Previously it had package-private
|
||
abstract methods. Also modified the index merging code so that it
|
||
can work on an arbitrary IndexReader implementation, and added a
|
||
new method, IndexWriter.addIndexes(IndexReader[]), to take
|
||
advantage of this. (cutting)
|
||
|
||
12. Added a limit to the number of clauses which may be added to a
|
||
BooleanQuery. The default limit is 1024 clauses. This should
|
||
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
|
||
queries which run amok. (cutting)
|
||
|
||
13. Add new method: IndexReader.undeleteAll(). This undeletes all
|
||
deleted documents which still remain in the index. (cutting)
|
||
|
||
|
||
1.3 RC1
|
||
|
||
1. Fixed PriorityQueue's clear() method.
|
||
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
|
||
(Matthijs Bomhoff via otis)
|
||
|
||
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
|
||
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
|
||
(Dale Anson via otis)
|
||
|
||
3. Added the ability to disable lock creation by using disableLuceneLocks
|
||
system property. This is useful for read-only media, such as CD-ROMs.
|
||
(otis)
|
||
|
||
4. Added id method to Hits to be able to access the index global id.
|
||
Required for sorting options.
|
||
(carlson)
|
||
|
||
5. Added support for new range query syntax to QueryParser.jj.
|
||
(briangoetz)
|
||
|
||
6. Added the ability to retrieve HTML documents' META tag values to
|
||
HTMLParser.jj.
|
||
(Mark Harwood via otis)
|
||
|
||
7. Modified QueryParser to make it possible to programmatically specify the
|
||
default Boolean operator (OR or AND).
|
||
(P<>ter Hal<61>csy via otis)
|
||
|
||
8. Made many search methods and classes non-final, per requests.
|
||
This includes IndexWriter and IndexSearcher, among others.
|
||
(cutting)
|
||
|
||
9. Added class RemoteSearchable, providing support for remote
|
||
searching via RMI. The test class RemoteSearchableTest.java
|
||
provides an example of how this can be used. (cutting)
|
||
|
||
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
|
||
test class TestPhrasePrefixQuery provides the usage example.
|
||
(Anders Nielsen via otis)
|
||
|
||
11. Changed the German stemming algorithm to ignore case while
|
||
stripping. The new algorithm is faster and produces more equal
|
||
stems from nouns and verbs derived from the same word.
|
||
(gschwarz)
|
||
|
||
12. Added support for boosting the score of documents and fields via
|
||
the new methods Document.setBoost(float) and Field.setBoost(float).
|
||
|
||
Note: This changes the encoding of an indexed value. Indexes
|
||
should be re-created from scratch in order for search scores to
|
||
be correct. With the new code and an old index, searches will
|
||
yield very large scores for shorter fields, and very small scores
|
||
for longer fields. Once the index is re-created, scores will be
|
||
as before. (cutting)
|
||
|
||
13. Added new method Token.setPositionIncrement().
|
||
|
||
This permits, for the purpose of phrase searching, placing
|
||
multiple terms in a single position. This is useful with
|
||
stemmers that produce multiple possible stems for a word.
|
||
|
||
This also permits the introduction of gaps between terms, so that
|
||
terms which are adjacent in a token stream will not be matched by
|
||
and exact phrase query. This makes it possible, e.g., to build
|
||
an analyzer where phrases are not matched over stop words which
|
||
have been removed.
|
||
|
||
Finally, repeating a token with an increment of zero can also be
|
||
used to boost scores of matches on that token. (cutting)
|
||
|
||
14. Added new Filter class, QueryFilter. This constrains search
|
||
results to only match those which also match a provided query.
|
||
Results are cached, so that searches after the first on the same
|
||
index using this filter are very fast.
|
||
|
||
This could be used, for example, with a RangeQuery on a formatted
|
||
date field to implement date filtering. One could re-use a
|
||
single QueryFilter that matches, e.g., only documents modified
|
||
within the last week. The QueryFilter and RangeQuery would only
|
||
need to be reconstructed once per day. (cutting)
|
||
|
||
15. Added a new IndexWriter method, getAnalyzer(). This returns the
|
||
analyzer used when adding documents to this index. (cutting)
|
||
|
||
16. Fixed a bug with IndexReader.lastModified(). Before, document
|
||
deletion did not update this. Now it does. (cutting)
|
||
|
||
17. Added Russian Analyzer.
|
||
(Boris Okner via otis)
|
||
|
||
18. Added a public, extensible scoring API. For details, see the
|
||
javadoc for org.apache.lucene.search.Similarity.
|
||
|
||
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
|
||
|
||
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
|
||
(Peter Mularien via otis)
|
||
|
||
21. Added getFields(String) and getValues(String) methods.
|
||
Contributed by Rasik Pandey on 2002-10-09
|
||
(Rasik Pandey via otis)
|
||
|
||
22. Revised internal search APIs. Changes include:
|
||
|
||
a. Queries are no longer modified during a search. This makes
|
||
it possible, e.g., to reuse the same query instance with
|
||
multiple indexes from multiple threads.
|
||
|
||
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
|
||
etc.) now work correctly with MultiSearcher, fixing bugs 12619
|
||
and 12667.
|
||
|
||
c. Boosting BooleanQuery's now works, and is supported by the
|
||
query parser (problem reported by Lee Mallabone). Thus a query
|
||
like "(+foo +bar)^2 +baz" is now supported and equivalent to
|
||
"(+foo^2 +bar^2) +baz".
|
||
|
||
d. New method: Query.rewrite(IndexReader). This permits a
|
||
query to re-write itself as an alternate, more primitive query.
|
||
Most of the term-expanding query classes (PrefixQuery,
|
||
WildcardQuery, etc.) are now implemented using this method.
|
||
|
||
e. New method: Searchable.explain(Query q, int doc). This
|
||
returns an Explanation instance that describes how a particular
|
||
document is scored against a query. An explanation can be
|
||
displayed as either plain text, with the toString() method, or
|
||
as HTML, with the toHtml() method. Note that computing an
|
||
explanation is as expensive as executing the query over the
|
||
entire index. This is intended to be used in developing
|
||
Similarity implementations, and, for good performance, should
|
||
not be displayed with every hit.
|
||
|
||
f. Scorer and Weight are public, not package protected. It now
|
||
possible for someone to write a Scorer implementation that is
|
||
not in the org.apache.lucene.search package. This is still
|
||
fairly advanced programming, and I don't expect anyone to do
|
||
this anytime soon, but at least now it is possible.
|
||
|
||
g. Added public accessors to the primitive query classes
|
||
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
|
||
their terms and clauses.
|
||
|
||
Caution: These are extensive changes and they have not yet been
|
||
tested extensively. Bug reports are appreciated.
|
||
(cutting)
|
||
|
||
23. Added convenience RAMDirectory constructors taking File and String
|
||
arguments, for easy FSDirectory to RAMDirectory conversion.
|
||
(otis)
|
||
|
||
24. Added code for manual renaming of files in FSDirectory, since it
|
||
has been reported that java.io.File's renameTo(File) method sometimes
|
||
fails on Windows JVMs.
|
||
(Matt Tucker via otis)
|
||
|
||
25. Refactored QueryParser to make it easier for people to extend it.
|
||
Added the ability to automatically lower-case Wildcard terms in
|
||
the QueryParser.
|
||
(Tatu Saloranta via otis)
|
||
|
||
|
||
1.2 RC6
|
||
|
||
1. Changed QueryParser.jj to have "?" be a special character which
|
||
allowed it to be used as a wildcard term. Updated TestWildcard
|
||
unit test also. (Ralf Hettesheimer via carlson)
|
||
|
||
1.2 RC5
|
||
|
||
1. Renamed build.properties to default.properties and updated
|
||
the BUILD.txt document to describe how to override the
|
||
default.property settings without having to edit the file. This
|
||
brings the build process closer to Scarab's build process.
|
||
(jon)
|
||
|
||
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
|
||
|
||
3. Updated "powered by" links. (otis)
|
||
|
||
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
|
||
|
||
5. Added throwing exception if FSDirectory could not create diectory
|
||
- Bug #6914 (Eugene Gluzberg via otis)
|
||
|
||
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
|
||
LowerCaseTokenizer javadoc (otis)
|
||
|
||
7. Added fix to avoid NullPointerException in results.jsp
|
||
(Mark Hayes via otis)
|
||
|
||
8. Changed Wildcard search to find 0 or more char instead of 1 or more
|
||
(Lee Mallobone, via otis)
|
||
|
||
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
|
||
(Rodrigo Reyes, via otis)
|
||
|
||
10. Added unit tests for wildcard search and DateFilter (otis)
|
||
|
||
11. Allow co-existence of indexed and non-indexed fields with the same name
|
||
(cutting/casper, via otis)
|
||
|
||
12. Add escape character to query parser.
|
||
(briangoetz)
|
||
|
||
13. Applied a patch that ensures that searches that use DateFilter
|
||
don't throw an exception when no matches are found. (David Smiley, via
|
||
otis)
|
||
|
||
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
|
||
|
||
|
||
1.2 RC4
|
||
|
||
1. Updated contributions section of website.
|
||
Add XML Document #3 implementation to Document Section.
|
||
Also added Term Highlighting to Misc Section. (carlson)
|
||
|
||
2. Fixed NullPointerException for phrase searches containing
|
||
unindexed terms, introduced in 1.2RC3. (cutting)
|
||
|
||
3. Changed document deletion code to obtain the index write lock,
|
||
enforcing the fact that document addition and deletion cannot be
|
||
performed concurrently. (cutting)
|
||
|
||
4. Various documentation cleanups. (otis, acoliver)
|
||
|
||
5. Updated "powered by" links. (cutting, jon)
|
||
|
||
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
|
||
|
||
7. Changed Term and Query to implement Serializable. (scottganyo)
|
||
|
||
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
|
||
(cutting)
|
||
|
||
9. Upgraded to JUnit 3.7. (otis)
|
||
|
||
1.2 RC3
|
||
|
||
1. IndexWriter: fixed a bug where adding an optimized index to an
|
||
empty index failed. This was encountered using addIndexes to copy
|
||
a RAMDirectory index to an FSDirectory.
|
||
|
||
2. RAMDirectory: fixed a bug where RAMInputStream could not read
|
||
across more than across a single buffer boundary.
|
||
|
||
3. Fix query parser so it accepts queries with unicode characters.
|
||
(briangoetz)
|
||
|
||
4. Fix query parser so that PrefixQuery is used in preference to
|
||
WildcardQuery when there's only an asterisk at the end of the
|
||
term. Previously PrefixQuery would never be used.
|
||
|
||
5. Fix tests so they compile; fix ant file so it compiles tests
|
||
properly. Added test cases for Analyzers and PriorityQueue.
|
||
|
||
6. Updated demos, added Getting Started documentation. (acoliver)
|
||
|
||
7. Added 'contributions' section to website & docs. (carlson)
|
||
|
||
8. Removed JavaCC from source distribution for copyright reasons.
|
||
Folks must now download this separately from metamata in order to
|
||
compile Lucene. (cutting)
|
||
|
||
9. Substantially improved the performance of DateFilter by adding the
|
||
ability to reuse TermDocs objects. (cutting)
|
||
|
||
10. Added IndexReader methods:
|
||
public static boolean indexExists(String directory);
|
||
public static boolean indexExists(File directory);
|
||
public static boolean indexExists(Directory directory);
|
||
public static boolean isLocked(Directory directory);
|
||
public static void unlock(Directory directory);
|
||
(cutting, otis)
|
||
|
||
11. Fixed bugs in GermanAnalyzer (gschwarz)
|
||
|
||
|
||
1.2 RC2, 19 October 2001:
|
||
- added sources to distribution
|
||
- removed broken build scripts and libraries from distribution
|
||
- SegmentsReader: fixed potential race condition
|
||
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
|
||
erases the directory contents, even when the directory
|
||
has already been accessed in this JVM.
|
||
- RangeQuery: Fix issue where an inclusive range query would
|
||
include the nearest term in the index above a non-existant
|
||
specified upper term.
|
||
- SegmentTermEnum: Fix NullPointerException in clone() method
|
||
when the Term is null.
|
||
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
|
||
since they rely on a feature added in JDK 1.2.
|
||
|
||
1.2 RC1 (first Apache release), 2 October 2001:
|
||
- packages renamed from com.lucene to org.apache.lucene
|
||
- license switched from LGPL to Apache
|
||
- ant-only build -- no more makefiles
|
||
- addition of lock files--now fully thread & process safe
|
||
- addition of German stemmer
|
||
- MultiSearcher now supports low-level search API
|
||
- added RangeQuery, for term-range searching
|
||
- Analyzers can choose tokenizer based on field name
|
||
- misc bug fixes.
|
||
|
||
1.01b (last Sourceforge release), 2 July 2001
|
||
. a few bug fixes
|
||
. new Query Parser
|
||
. new prefix query (search for "foo*" matches "food")
|
||
|
||
1.0, 2000-10-04
|
||
|
||
This release fixes a few serious bugs and also includes some
|
||
performance optimizations, a stemmer, and a few other minor
|
||
enhancements.
|
||
|
||
0.04 2000-04-19
|
||
|
||
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
|
||
|
||
The only tokenizer included in the previous release (LetterTokenizer)
|
||
identified terms consisting entirely of alphabetic characters. The
|
||
new tokenizer uses a regular-expression grammar to identify more
|
||
complex classes of terms, including numbers, acronyms, email
|
||
addresses, etc.
|
||
|
||
StandardTokenizer serves two purposes:
|
||
|
||
1. It is a much better, general purpose tokenizer for use by
|
||
applications as is.
|
||
|
||
The easiest way for applications to start using
|
||
StandardTokenizer is to use StandardAnalyzer.
|
||
|
||
2. It provides a good example of grammar-based tokenization.
|
||
|
||
If an application has special tokenization requirements, it can
|
||
implement a custom tokenizer by copying the directory containing
|
||
the new tokenizer into the application and modifying it
|
||
accordingly.
|
||
|
||
0.01, 2000-03-30
|
||
|
||
First open source release.
|
||
|
||
The code has been re-organized into a new package and directory
|
||
structure for this release. It builds OK, but has not been tested
|
||
beyond that since the re-organization.
|