Lucene Change Log $Id$ 1.3 DEV1 1. Fixed PriorityQueue's clear() method. Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454 (Matthijs Bomhoff via otis) 2. Changed StandardTokenizer.jj grammar for EMAIL tokens. Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015 (Dale Anson via otis) 3. Added the ability to disable lock creation by using disableLuceneLocks system property. This is useful for read-only media, such as CD-ROMs. (otis) 4. Added id method to Hits to be able to access the index global id. Required for sorting options. (carlson) 5. Added support for new range query syntax to QueryParser.jj. (briangoetz) 6. Added the ability to retrieve HTML documents' META tag values to HTMLParser.jj. (Mark Harwood via otis) 7. Modified QueryParser to make it possible to programmatically specify the default Boolean operator (OR or AND). (Péter Halácsy via otis) 8. Made many search methods and classes non-final, per requests. This includes IndexWriter and IndexSearcher, among others. (cutting) 9. Added class RemoteSearchable, providing support for remote searching via RMI. The test class RemoteSearchableTest.java provides an example of how this can be used. (cutting) 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The test class TestPhrasePrefixQuery provides the usage example. (Anders Nielsen via otis) 11. Changed the German stemming algorithm to ignore case while stripping. The new algorithm is faster and produces more equal stems from nouns and verbs derived from the same word. (gschwarz) 12. Added support for boosting the score of documents and fields via the new methods Document.setBoost(float) and Field.setBoost(float). Note: This changes the encoding of an indexed value. Indexes should be re-created from scratch in order for search scores to be correct. With the new code and an old index, searches will yield very large scores for shorter fields, and very small scores for longer fields. Once the index is re-created, scores will be as before. (cutting) 13. Added new method Token.setPositionIncrement(). This permits, for the purpose of phrase searching, placing multiple terms in a single position. This is useful with stemmers that produce multiple possible stems for a word. This also permits the introduction of gaps between terms, so that terms which are adjacent in a token stream will not be matched by and exact phrase query. This makes it possible, e.g., to build an analyzer where phrases are not matched over stop words which have been removed. Finally, repeating a token with an increment of zero can also be used to boost scores of matches on that token. (cutting) 14. Added new Filter class, QueryFilter. This constrains search results to only match those which also match a provided query. Results are cached, so that searches after the first on the same index using this filter are very fast. This could be used, for example, with a RangeQuery on a formatted date field to implement date filtering. One could re-use a single QueryFilter that matches, e.g., only documents modified within the last week. The QueryFilter and RangeQuery would only need to be reconstructed once per day. (cutting) 15. Added a new IndexWriter method, getAnalyzer(). This returns the analyzer used when adding documents to this index. (cutting) 16. Fixed a bug with IndexReader.lastModified(). Before, document deletion did not update this. Now it does. (cutting) 17. Added Russian Analyzer. (Boris Okner via otis) 18. Added a public, extensible scoring API. For details, see the javadoc for org.apache.lucene.search.Similarity. 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter). 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes. (Peter Mularien via otis) 21. Added getFields(String) and getValues(String) methods. (Rasik Pandey via otis) 22. Revised internal search APIs. Changes include: a. Queries are no longer modified during a search. This makes it possible, e.g., to reuse the same query instance with multiple indexes from multiple threads. b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery, etc.) now work correctly with MultiSearcher, fixing bugs 12619 and 12667. c. Boosting BooleanQuery's now works, and is supported by the query parser (problem reported by Lee Mallabone). Thus a query like "(+foo +bar)^2 +baz" is now supported and equivalent to "(+foo^2 +bar^2) +baz". d. New method: Query.rewrite(IndexReader). This permits a query to re-write itself as an alternate, more primitive query. Most of the term-expanding query classes (PrefixQuery, WildcardQuery, etc.) are now implemented using this method. e. New method: Searchable.explain(Query q, int doc). This returns an Explanation instance that describes how a particular document is scored against a query. An explanation can be displayed as either plain text, with the toString() method, or as HTML, with the toHtml() method. Note that computing an explanation is as expensive as executing the query over the entire index. This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. f. Scorer and Weight are public, not package protected. It now possible for someone to write a Scorer implementation that is not in the org.apache.lucene.search package. This is still fairly advanced programming, and I don't expect anyone to do this anytime soon, but at least now it is possible. Caution: These are extensive changes and they have not yet been tested extensively. Bug reports are appreciated. Contributed by Rasik Pandey on 2002-10-09 1.2 RC6 1. Changed QueryParser.jj to have "?" be a special character which allowed it to be used as a wildcard term. Updated TestWildcard unit test also. (Ralf Hettesheimer via carlson) 1.2 RC5 1. Renamed build.properties to default.properties and updated the BUILD.txt document to describe how to override the default.property settings without having to edit the file. This brings the build process closer to Scarab's build process. (jon) 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis) 3. Updated "powered by" links. (otis) 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis) 5. Added throwing exception if FSDirectory could not create diectory - Bug #6914 (Eugene Gluzberg via otis) 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter, LowerCaseTokenizer javadoc (otis) 7. Added fix to avoid NullPointerException in results.jsp (Mark Hayes via otis) 8. Changed Wildcard search to find 0 or more char instead of 1 or more (Lee Mallobone, via otis) 9. Fixed error in offset issue in GermanStemFilter - Bug #7412 (Rodrigo Reyes, via otis) 10. Added unit tests for wildcard search and DateFilter (otis) 11. Allow co-existence of indexed and non-indexed fields with the same name (cutting/casper, via otis) 12. Add escape character to query parser. (briangoetz) 13. Applied a patch that ensures that searches that use DateFilter don't throw an exception when no matches are found. (David Smiley, via otis) 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson) 1.2 RC4 1. Updated contributions section of website. Add XML Document #3 implementation to Document Section. Also added Term Highlighting to Misc Section. (carlson) 2. Fixed NullPointerException for phrase searches containing unindexed terms, introduced in 1.2RC3. (cutting) 3. Changed document deletion code to obtain the index write lock, enforcing the fact that document addition and deletion cannot be performed concurrently. (cutting) 4. Various documentation cleanups. (otis, acoliver) 5. Updated "powered by" links. (cutting, jon) 6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis) 7. Changed Term and Query to implement Serializable. (scottganyo) 8. Fixed to never delete indexes added with IndexWriter.addIndexes(). (cutting) 9. Upgraded to JUnit 3.7. (otis) 1.2 RC3 1. IndexWriter: fixed a bug where adding an optimized index to an empty index failed. This was encountered using addIndexes to copy a RAMDirectory index to an FSDirectory. 2. RAMDirectory: fixed a bug where RAMInputStream could not read across more than across a single buffer boundary. 3. Fix query parser so it accepts queries with unicode characters. (briangoetz) 4. Fix query parser so that PrefixQuery is used in preference to WildcardQuery when there's only an asterisk at the end of the term. Previously PrefixQuery would never be used. 5. Fix tests so they compile; fix ant file so it compiles tests properly. Added test cases for Analyzers and PriorityQueue. 6. Updated demos, added Getting Started documentation. (acoliver) 7. Added 'contributions' section to website & docs. (carlson) 8. Removed JavaCC from source distribution for copyright reasons. Folks must now download this separately from metamata in order to compile Lucene. (cutting) 9. Substantially improved the performance of DateFilter by adding the ability to reuse TermDocs objects. (cutting) 10. Added IndexReader methods: public static boolean indexExists(String directory); public static boolean indexExists(File directory); public static boolean indexExists(Directory directory); public static boolean isLocked(Directory directory); public static void unlock(Directory directory); (cutting, otis) 11. Fixed bugs in GermanAnalyzer (gschwarz) 1.2 RC2, 19 October 2001: - added sources to distribution - removed broken build scripts and libraries from distribution - SegmentsReader: fixed potential race condition - FSDirectory: fixed so that getDirectory(xxx,true) correctly erases the directory contents, even when the directory has already been accessed in this JVM. - RangeQuery: Fix issue where an inclusive range query would include the nearest term in the index above a non-existant specified upper term. - SegmentTermEnum: Fix NullPointerException in clone() method when the Term is null. - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1, since they rely on a feature added in JDK 1.2. 1.2 RC1 (first Apache release), 2 October 2001: - packages renamed from com.lucene to org.apache.lucene - license switched from LGPL to Apache - ant-only build -- no more makefiles - addition of lock files--now fully thread & process safe - addition of German stemmer - MultiSearcher now supports low-level search API - added RangeQuery, for term-range searching - Analyzers can choose tokenizer based on field name - misc bug fixes. 1.01b (last Sourceforge release), 2 July 2001 . a few bug fixes . new Query Parser . new prefix query (search for "foo*" matches "food") 1.0, 2000-10-04 This release fixes a few serious bugs and also includes some performance optimizations, a stemmer, and a few other minor enhancements. 0.04 2000-04-19 Lucene now includes a grammar-based tokenizer, StandardTokenizer. The only tokenizer included in the previous release (LetterTokenizer) identified terms consisting entirely of alphabetic characters. The new tokenizer uses a regular-expression grammar to identify more complex classes of terms, including numbers, acronyms, email addresses, etc. StandardTokenizer serves two purposes: 1. It is a much better, general purpose tokenizer for use by applications as is. The easiest way for applications to start using StandardTokenizer is to use StandardAnalyzer. 2. It provides a good example of grammar-based tokenization. If an application has special tokenization requirements, it can implement a custom tokenizer by copying the directory containing the new tokenizer into the application and modifying it accordingly. 0.01, 2000-03-30 First open source release. The code has been re-organized into a new package and directory structure for this release. It builds OK, but has not been tested beyond that since the re-organization.