LUCENE-1084: Changed all IndexWriter constructors to take an
-explicit parameter for maximum field size. Deprecated all the
-pre-existing constructors; these will be removed in release 3.0. (Steven Rowe via Mike McCandless)
-
LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
-this was accidentally lost with LUCENE-966. (Nicolas Lalevée via
-Mike McCandless)
LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
- process. The flag is not indexed/stored and is thus only used by analysis.
-
-
LUCENE-1147: Add -segment option to CheckIndex tool so you can
-check only a specific segment or segments in your index. (Mike
-McCandless)
LUCENE-705: When building a compound file, use
-RandomAccessFile.setLength() to tell the OS/filesystem to
-pre-allocate space for the file. This may improve fragmentation
-in how the CFS file is stored, and allows us to detect an upcoming
-disk full situation before actually filling up the disk. (Mike
-McCandless)
-
LUCENE-1120: Speed up merging of term vectors by bulk-copying the
-raw bytes for each contiguous range of non-deleted documents. (Mike McCandless)
LUCENE-994: Defaults for IndexWriter have been changed to maximize
-out-of-the-box indexing speed. First, IndexWriter now flushes by
-RAM usage (16 MB by default) instead of a fixed doc count (call
-IndexWriter.setMaxBufferedDocs to get backwards compatible
-behavior). Second, ConcurrentMergeScheduler is used to run merges
-using background threads (call IndexWriter.setMergeScheduler(new
-SerialMergeScheduler()) to get backwards compatible behavior).
-Third, merges are chosen based on size in bytes of each segment
-rather than document count of each segment (call
-IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
-backwards compatible behavior).
-
-NOTE: users of ParallelReader must change back all of these
-defaults in order to ensure the docIDs "align" across all parallel
-indices. (Mike McCandless)
-
LUCENE-1045: SortField.AUTO didn't work with long. When detecting
-the field type for sorting automatically, numbers used to be
-interpreted as int, then as float, if parsing the number as an int
-failed. Now the detection checks for int, then for long,
-then for float. (Daniel Naber)
LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
-IndexWriter flush whenever the buffered documents are using more
-than the specified amount of RAM. Also added new APIs to Token
-that allow one to set a char[] plus offset and length to specify a
-token (to avoid creating a new String() for each Token). (Mike
-McCandless)
-
LUCENE-963: Add setters to Field to allow for re-using a single
-Field instance during indexing. This is a sizable performance
-gain, especially for small documents. (Mike McCandless)
-
LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
-permit re-using of Token and TokenStream instances during
-indexing. Changed Token to use a char[] as the store for the
-termText instead of String. This gives faster tokenization
-performance (~10-15%). (Mike McCandless)
-
LUCENE-847: Factored MergePolicy, which determines which merges
-should take place and when, as well as MergeScheduler, which
-determines when the selected merges should actually run, out of
-IndexWriter. The default merge policy is now
-LogByteSizeMergePolicy (see LUCENE-845) and the default merge
-scheduler is now ConcurrentMergeScheduler (see
-LUCENE-870). (Steven Parkes via Mike McCandless)
-
LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
-that allows you to reduce memory usage of the termInfos by further
-sub-sampling (over the termIndexInterval that was used during
-indexing) which terms are loaded into memory. (Chuck Williams,
-Doug Cutting via Mike McCandless)
-
LUCENE-743: Add IndexReader.reopen() method that re-opens an
-existing IndexReader (see New features -> 8.) (Michael Busch)
-
LUCENE-1062: Add setData(byte[] data),
-setData(byte[] data, int offset, int length), getData(), getOffset()
-and clone() methods to o.a.l.index.Payload. Also add the field name
-as arg to Similarity.scorePayload(). (Michael Busch)
-
LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
-"partially optimize" an index down to maxNumSegments segments. (Mike McCandless)
-
LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
-
-
LUCENE-1064: Changed TopDocs constructor to be public. (Shai Erera via Michael Busch)
-
LUCENE-1079: DocValues cleanup: constructor now has no params,
-and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
-
LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
-the Object (if any) that was bumped from the queue to allow
-re-use. (Shai Erera via Mike McCandless)
-
LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
-modified so it is token producer's responsibility
-to call Token.clear(). (Doron Cohen)
-
LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
-255 characters) tokens. You can increase this limit by calling
-StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
LUCENE-933: QueryParser fixed to not produce empty sub
-BooleanQueries "()" even if the Analyzer produced no
-tokens for input. (Doron Cohen)
-
LUCENE-955: Fixed SegmentTermPositions to work correctly with the
-first term in the dictionary. (Michael Busch)
-
LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
-that was thrown after a call of TermPositions.seek(). (Rich Johnson via Michael Busch)
-
LUCENE-938: Fixed cases where an unhandled exception in
-IndexWriter's methods could cause deletes to be lost. (Steven Parkes via Mike McCandless)
-
LUCENE-962: Fixed case where an unhandled exception in
-IndexWriter.addDocument or IndexWriter.updateDocument could cause
-unreferenced files in the index to not be deleted (Steven Parkes via Mike McCandless)
-
LUCENE-957: RAMDirectory fixed to properly handle directories
-larger than Integer.MAX_VALUE. (Doron Cohen)
-
LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
-isOptimized() or getVersion() is called. Separated MultiReader
-into two classes: MultiSegmentReader extends IndexReader, is
-package-protected and is created automatically by IndexReader.open()
-in case the index has multiple segments. The public MultiReader
-now extends MultiSegmentReader and is intended to be used by users
-who want to add their own subreaders. (Daniel Naber, Michael Busch)
-
LUCENE-970: FilterIndexReader now implements isOptimized(). Before
-a call of isOptimized() would throw a NPE. (Michael Busch)
-
LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
-isOptimized() or getVersion() is called. (Michael Busch)
-
LUCENE-948: Fix FNFE exception caused by stale NFS client
-directory listing caches when writers on different machines are
-sharing an index over NFS and using a custom deletion policy (Mike
-McCandless)
-
LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
-close any streams they had opened if an exception is hit in the
-constructor. (Ning Li via Mike McCandless)
-
LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
-we now throw an IllegalArgumentException saying the term is too
-long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
-Wettin via Mike McCandless)
-
LUCENE-991: The explain() method of BoostingTermQuery had errors
-when no payloads were present on a document. (Peter Keegan via
-Grant Ingersoll)
-
LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
-(this was broken by LUCENE-843). (Ning Li via Mike McCandless)
-
LUCENE-1008: Fixed corruption case when document with no term
-vector fields is added after documents with term vector fields.
-This bug was introduced with LUCENE-843. (Grant Ingersoll via
-Mike McCandless)
-
LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
-length quoted string.) (yonik)
-
LUCENE-1010: Fixed corruption case when document with no term
-vector fields is added after documents with term vector fields.
-This case is hit during merge and would cause an EOFException.
-This bug was introduced with LUCENE-984. (Andi Vajda via Mike
-McCandless)
-
LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
-autoCommit=false and documents are using stored fields and/or term
-vectors. (Mark Miller via Mike McCandless)
-
LUCENE-1011: Fixed corruption case when two or more machines,
-sharing an index over NFS, can be writers in quick succession. (Patrick Kimber via Mike McCandless)
-
LUCENE-1028: Fixed Weight serialization for few queries:
-DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
-Serialization check added for all queries. (Kyle Maxwell via Doron Cohen)
-
LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
-timeout argument is very large (eg Long.MAX_VALUE). Also added
-Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
-Diakov via Mike McCandless)
-
LUCENE-1050: Throw LockReleaseFailedException in
-Simple/NativeFSLockFactory if we fail to delete the lock file when
-releasing the lock. (Nikolay Diakov via Mike McCandless)
-
LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
-the merged segment. (Michael Busch)
-
LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
-with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
-
LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
-along with iterating the hits. Deleting docs already retrieved
-now works seamlessly. If docs not yet retrieved are deleted
-(e.g. from another thread), and then, relying on the initial
-Hits.length(), an application attempts to retrieve more hits
-than actually exist , a ConcurrentMidificationException
-is thrown. (Doron Cohen)
-
LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
- the type of some tokens incorrectly. This is done by adding a new flag named
- replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
- this flag to true fixes the problem. This flag is a temporary fix and is already
- marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
- LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
-
LUCENE-749: ChainedFilter behavior fixed when logic of
-first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
-
LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
-term) after next() returns false. (Steven Tamm via Mike
-McCandless)
LUCENE-906: Elision filter for French. (Mathieu Lecarme via Otis Gospodnetic)
-
LUCENE-960: Added a SpanQueryFilter and related classes to allow for
-not only filtering, but knowing where in a Document a Filter matches (Grant Ingersoll)
-
LUCENE-868: Added new Term Vector access features. New callback
-mechanism allows application to define how and where to read Term
-Vectors from disk. This implementation contains several extensions
-of the new abstract TermVectorMapper class. The new API should be
-back-compatible. No changes in the actual storage of Term Vectors
-has taken place.
-
-
1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
- to provide information about what document is being accessed. (Karl Wettin via Grant Ingersoll)
-
LUCENE-975: Added PositionBasedTermVectorMapper that allows for
-position based lookup of term vector information.
-See item #3 above (LUCENE-868).
-
-
LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
-to verify that locking is working properly. LockVerifyServer runs
-a separate server to verify locks. LockStressTest runs a simple
-tool that rapidly obtains and releases locks.
-VerifyingLockFactory is a LockFactory that wraps any other
-LockFactory and consults the LockVerifyServer whenever a lock is
-obtained or released, throwing an exception if an illegal lock
-obtain occurred. (Patrick Kimber via Mike McCandless)
-
LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
-support doubles and longs. Added support into SortField for sorting
-on doubles and longs as well. (Grant Ingersoll)
-
LUCENE-1020: Created basic index checking & repair tool
-(o.a.l.index.CheckIndex). When run without -fix it does a
-detailed test of all segments in the index and reports summary
-information and any errors it hit. With -fix it will remove
-segments that had errors. (Mike McCandless)
-
LUCENE-743: Add IndexReader.reopen() method that re-opens an
-existing IndexReader by only loading those portions of an index
-that have changed since the reader was (re)opened. reopen() can
-be significantly faster than open(), depending on the amount of
-index changes. SegmentReader, MultiSegmentReader, MultiReader,
-and ParallelReader implement reopen(). (Michael Busch)
-
LUCENE-1040: CharArraySet useful for efficiently checking
-set membership of text specified by char[]. (yonik)
-
LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
-live backup of an index without pausing indexing. (Mike
-McCandless)
-
LUCENE-1019: CustomScoreQuery enhanced to support multiple
-ValueSource queries. (Kyle Maxwell via Doron Cohen)
-
LUCENE-1095: Added an option to StopFilter to increase
-positionIncrement of the token succeeding a stopped token.
-Disabled by default. Similar option added to QueryParser
-to consider token positions when creating PhraseQuery
-and MultiPhraseQuery. Disabled by default (so by default
-the query parser ignores position increments). (Doron Cohen)
LUCENE-937: CachingTokenFilter now uses an iterator to access the
-Tokens that are cached in the LinkedList. This increases performance
-significantly, especially when the number of Tokens is large. (Mark Miller via Michael Busch)
-
LUCENE-843: Substantial optimizations to improve how IndexWriter
-uses RAM for buffering documents and to speed up indexing (2X-8X
-faster). A single shared hash table now records the in-memory
-postings per unique term and is directly flushed into a single
-segment. (Mike McCandless)
-
LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
-takes place when using compound files. (Mike McCandless)
-
LUCENE-959: Remove synchronization in Document (yonik)
-
LUCENE-963: Add setters to Field to allow for re-using a single
-Field instance during indexing. This is a sizable performance
-gain, especially for small documents. (Mike McCandless)
-
LUCENE-939: Check explicitly for boundary conditions in FieldInfos
-and don't rely on exceptions. (Michael Busch)
-
LUCENE-966: Very substantial speedups (~6X faster) for
-StandardTokenizer (StandardAnalyzer) by using JFlex instead of
-JavaCC to generate the tokenizer. (Stanislaw Osinski via Mike McCandless)
-
LUCENE-969: Changed core tokenizers & filters to re-use Token and
-TokenStream instances when possible to improve tokenization
-performance (~10-15%). (Mike McCandless)
-
LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
-McCandless)
-
LUCENE-986: Refactored SegmentInfos from IndexReader into the new
-subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
-now extend DirectoryIndexReader and are the only IndexReader
-implementations that use SegmentInfos to access an index and
-acquire a write lock for index modifications. (Michael Busch)
-
LUCENE-1007: Allow flushing in IndexWriter to be triggered by
-either RAM usage or document count or both (whichever comes
-first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
-one of the flush triggers. (Ning Li via Mike McCandless)
-
LUCENE-1043: Speed up merging of stored fields by bulk-copying the
-raw bytes for each contiguous range of non-deleted documents. (Robert Engels via Mike McCandless)
-
LUCENE-693: Speed up nested conjunctions (~2x) that match many
-documents, and a slight performance increase for top level
-conjunctions. (yonik)
-
LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
-and final. (Nathan Beyer via Michael Busch)
LUCENE-1051: Generate separate javadocs for core, demo and contrib
-classes, as well as an unified view. Also add an appropriate menu
-structure to the website. (Michael Busch)
-
LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery. (Ronnie Kolehmainen via Michael Busch)
LUCENE-908: Improvements and simplifications for how the MANIFEST
-file and the META-INF dir are created. (Michael Busch)
-
LUCENE-935: Various improvements for the maven artifacts. Now the
-artifacts also include the sources as .jar files. (Michael Busch)
-
Added apply-patch target to top-level build. Defaults to looking for
-a patch in ${basedir}/../patches with name specified by -Dpatch.name.
-Can also specify any location by -Dpatch.file property on the command
-line. This should be helpful for easy application of patches, but it
-is also a step towards integrating automatic patch application with
-JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
-
LUCENE-935: Defined property "m2.repository.url" to allow setting
-the url to a maven remote repository to deploy to. (Michael Busch)
-
LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
-
LUCENE-1055: Remove gdata-server from build files and its sources
-from trunk. (Michael Busch)
-
LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
-via scp and ssh authentication. (Michael Busch)
-
LUCENE-1123: Allow overriding the specification version for
-MANIFEST.MF (Michael Busch)
LUCENE-793: created new exceptions and added them to throws clause
-for many methods (all subclasses of IOException for backwards
-compatibility): index.StaleReaderException,
-index.CorruptIndexException, store.LockObtainFailedException.
-This was done to better call out the possible root causes of an
-IOException from these methods. (Mike McCandless)
-
LUCENE-811: make SegmentInfos class, plus a few methods from related
-classes, package-private again (they were unnecessarily made public
-as part of LUCENE-701). (Mike McCandless)
-
LUCENE-710: added optional autoCommit boolean to IndexWriter
-constructors. When this is false, index changes are not committed
-until the writer is closed. This gives explicit control over when
-a reader will see the changes. Also added optional custom
-deletion policy to explicitly control when prior commits are
-removed from the index. This is intended to allow applications to
-share an index over NFS by customizing when prior commits are
-deleted. (Mike McCandless)
-
LUCENE-818: changed most public methods of IndexWriter,
-IndexReader (and its subclasses), FieldsReader and RAMDirectory to
-throw AlreadyClosedException if they are accessed after being
-closed. (Mike McCandless)
-
LUCENE-834: Changed some access levels for certain Span classes to allow them
-to be overridden. They have been marked expert only and not for public
-consumption. (Grant Ingersoll)
-
LUCENE-796: Removed calls to super.* from various get*Query methods in
-MultiFieldQueryParser, in order to allow sub-classes to override them. (Steven Parkes via Otis Gospodnetic)
-
LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
-in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
-combination when caching is desired. (Chris Hostetter, Otis Gospodnetic)
-
LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
-to enable extensibility of these classes. (Michael Busch)
-
LUCENE-580: Added the public method reset() to TokenStream. This method does
-nothing by default, but may be overwritten by subclasses to support consuming
-the TokenStream more than once. (Michael Busch)
-
LUCENE-580: Added a new constructor to Field that takes a TokenStream as
-argument, available as tokenStreamValue(). This is useful to avoid the need of
-"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
-
LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
-getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
-getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
-improves performance for certain queries but results in scoring out of docid
-order. This patch reverse this change, so now by default hit docs are scored
-in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
-This patch also enables the tests in QueryUtils again that check for docid
-order. (Paul Elschot, Doron Cohen, Michael Busch)
-
LUCENE-888: Added Directory.openInput(File path, int bufferSize)
-to optionally specify the size of the read buffer. Also added
-BufferedIndexInput.setBufferSize(int) to change the buffer size. (Mike McCandless)
-
LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
-to be public because it implements the public interface TermPositionVector. (Michael Busch)
LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
-
LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
-Query parser modified to create a prefix query only for the case
-that there is a single trailing wildcard (and no additional wildcard
-or '?' in the query text). (Doron Cohen)
-
LUCENE-812: Add no-argument constructors to NativeFSLockFactory
-and SimpleFSLockFactory. This enables all 4 builtin LockFactory
-implementations to be specified via the System property
-org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
-
LUCENE-821: The new single-norm-file introduced by LUCENE-756
-failed to reduce the number of open descriptors since it was still
-opened once per field with norms. (yonik)
-
LUCENE-823: Make sure internal file handles are closed when
-hitting an exception (eg disk full) while flushing deletes in
-IndexWriter's mergeSegments, and also during
-IndexWriter.addIndexes. (Mike McCandless)
-
LUCENE-825: If directory is removed after
-FSDirectory.getDirectory() but before IndexReader.open you now get
-a FileNotFoundException like Lucene pre-2.1 (before this fix you
-got an NPE). (Mike McCandless)
-
LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
-because the backslash is the escape character. Also changed the ESCAPED_CHAR
-list to contain all possible characters, because every character that
-follows a backslash should be considered as escaped. (Michael Busch)
-
LUCENE-372: QueryParser.parse() now ensures that the entire input string
-is consumed. Now a ParseException is thrown if a query contains too many
-closing parentheses. (Andreas Neumann via Michael Busch)
-
LUCENE-814: javacc build targets now fix line-end-style of generated files.
-Now also deleting all javacc generated files before calling javacc. (Steven Parkes, Doron Cohen)
-
LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
-
LUCENE-828: Minor fix for Term's equal(). (Paul Cowan via Otis Gospodnetic)
-
LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
-and you call addIndexes, and hit an exception (eg disk full) then
-when IndexWriter rolls back its internal state this could corrupt
-the instance of IndexWriter (but, not the index itself) by
-referencing already deleted segments. This bug was only present
-in 2.2 (trunk), ie was never released. (Mike McCandless)
-
LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
-For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
-
LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
-by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
-Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
-was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
-designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
-
LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
-has written the postings. Then the resources associated with the
-TokenStreams can safely be released. (Michael Busch)
LUCENE-881: QueryParser.escape() now also escapes the characters
-'|' and '&' which are part of the queryparser syntax. (Michael Busch)
-
LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
-anymore and ignored, but re-thrown. Some javadoc improvements. (Daniel Naber)
-
LUCENE-698: FilteredQuery now takes the query boost into account for
-scoring. (Michael Busch)
-
LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
-enumeration. (Christian Mallwitz via Daniel Naber)
-
LUCENE-903: FilteredQuery explanation inaccuracy with boost.
-Explanation tests now "deep" check the explanation details. (Chris Hostetter, Doron Cohen)
-
LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
-skip target param and ends up at the first match. (Sudaakeran B. via Chris Hostetter & Doron Cohen)
-
LUCENE-913: Two consecutive score() calls return different
-scores for Boolean Queries. (Michael Busch, Doron Cohen)
-
LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
-box", again, by moving set/getMaxMergeDocs up from
-LogDocMergePolicy into LogMergePolicy. This fixes the API
-breakage (non backwards compatible change) caused by LUCENE-994. (Yonik Seeley via Mike McCandless)
LUCENE-759: Added two n-gram-producing TokenFilters. (Otis Gospodnetic)
-
LUCENE-822: Added FieldSelector capabilities to Searchable for use with
-RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
-
LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
-These metadata are called Payloads. For every position of a Token one Payload in the form
-of a variable length byte array can be stored in the prox file.
-Remark: The APIs introduced with this feature are in experimental state and thus
- contain appropriate warnings in the javadocs. (Michael Busch)
-
LUCENE-834: Added BoostingTermQuery which can boost scores based on the
-values of a payload (see #3 above.) (Grant Ingersoll)
-
LUCENE-834: Similarity has a new method for scoring payloads called
-scorePayloads that can be overridden to take advantage of payload
-storage (see #3 above)
-
LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
-implemented it in the appropriate places (Grant Ingersoll)
-
LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
-on the remote side of the RMI connection. (Matt Ericson via Otis Gospodnetic)
-
LUCENE-446: Added Solr's search.function for scores based on field
-values, plus CustomScoreQuery for simple score (post) customization. (Yonik Seeley, Doron Cohen)
-
LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
-Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
-Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
-between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples. (Grant Ingersoll, Michael Busch, Yonik Seeley)
LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
-when nextPosition() is called for the first time. This allows using instances
-of SegmentTermPositions instead of SegmentTermDocs without additional costs. (Michael Busch)
-
LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
-IndexOutput directly now. This avoids further buffering and thus avoids
-unnecessary array copies. (Michael Busch)
-
LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
-cases and possibly improve scoring performance. Documents can now be
-delivered out-of-order as they are scored (e.g. to HitCollector).
-N.B. A bit of code had to be disabled in QueryUtils in order for
-TestBoolean2 test to keep passing. (Paul Elschot via Otis Gospodnetic)
-
LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
-them to keep the spell index small. (Daniel Naber)
-
LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
-Together with LUCENE-888 this will allow to adjust the buffer size
-dynamically. (Paul Elschot, Michael Busch)
-
LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
-BufferedIndexOutput. Also increase buffer size in
-BufferedIndexInput, but only when used during merging. Together,
-these increases yield 10-18% overall performance gain vs the
-previous 1K defaults. (Mike McCandless)
-
LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
-up most queries that use skipTo(), especially on big indexes with large posting
-lists. For average AND queries the speedup is about 20%, for queries that
-contain very frequent and very unique terms the speedup can be over 80%. (Michael Busch)
LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
-http://wiki.apache.org/lucene-java/ Updated the links in the docs and
-wherever else I found references. (Grant Ingersoll, Joe Schaefer)
-
LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
-consistent with java.util.Comparator.compare(): Any integer is allowed to
-be returned instead of only -1/0/1. (Paul Cowan via Michael Busch)
-
LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
-Solved javadoc errors under jdk5 (jars in path for gdata).
-Made "javadocs" target depend on "build-contrib" for first downloading
-contrib jars configured for dynamic downloaded. (Note: when running
-behind firewall, a firewall prompt might pop up) (Doron Cohen)
-
LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
-remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
-
LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars. (Steven Parkes via Michael Busch)
-
LUCENE-885: "ant test" now includes all contrib tests. The new
-"ant test-core" target can be used to run only the Core (non
-contrib) tests. (Chris Hostetter)
-
LUCENE-900: "ant test" now enables Java assertions (in Lucene packages). (Doron Cohen)
-
LUCENE-894: Add custom build file for binary distributions that includes
-targets to build the demos. (Chris Hostetter, Michael Busch)
-
LUCENE-904: The "package" targets in build.xml now also generate .md5
-checksum files. (Chris Hostetter, Michael Busch)
-
LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
-demo war, demo jar, and the contrib jars. (Michael Busch)
-
LUCENE-909: Demo targets for running the demo. (Doron Cohen)
-
LUCENE-908: Improves content of MANIFEST file and makes it customizable
-for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
-jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt. (Chris Hostetter, Michael Busch)
-
LUCENE-930: Various contrib building improvements to ensure contrib
-dependencies are met, and test compilation errors fail the build. (Steven Parkes, Chris Hostetter)
-
LUCENE-622: Add ant target and pom.xml files for building maven artifacts
-of the Lucene core and the contrib modules. (Sami Siren, Karl Wettin, Michael Busch)
's' and 't' have been removed from the list of default stopwords
-in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
-as a stopword meant that 's-class' led to the same results as 'class'.
-Note that this problem still exists for 'a', e.g. in 'a-class' as
-'a' continues to be a stopword. (Daniel Naber)
-
LUCENE-478: Updated the list of Unicode code point ranges for CJK
-(now split into CJ and K) in StandardAnalyzer. (John Wang and
-Steven Rowe via Otis Gospodnetic)
-
Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
-and added a few more of them to increase CJK character coverage.
-Also documented some of the ranges. (Otis Gospodnetic)
-
LUCENE-489: Add support for leading wildcard characters (*, ?) to
-QueryParser. Default is to disallow them, as before. (Steven Parkes via Otis Gospodnetic)
-
LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
-for range queries. Added useOldRangeQuery property to QueryParser to allow
-selection of old RangeQuery class if required. (Mark Harwood)
-
LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
-does not contain a wildcard character (? or *), when previously a
-StringIndexOutOfBoundsException was thrown. (Michael Busch via Erik Hatcher)
-
LUCENE-726: Removed the use of deprecated doc.fields() method and
-Enumeration. (Michael Busch via Otis Gospodnetic)
-
LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
-and added a call to enumerators.remove() in TermInfosReader.close().
-The finalize() overrides were added to help with a pre-1.4.2 JVM bug
-that has since been fixed, plus we no longer support pre-1.4.2 JVMs. (Otis Gospodnetic)
-
LUCENE-771: The default location of the write lock is now the
-index directory, and is named simply "write.lock" (without a big
-digest prefix). The system properties "org.apache.lucene.lockDir"
-nor "java.io.tmpdir" are no longer used as the global directory
-for storing lock files, and the LOCK_DIR field of FSDirectory is
-now deprecated. (Mike McCandless)
LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers (Samphan Raruenrom via Chris Hostetter)
-
LUCENE-545: New FieldSelector API and associated changes to
-IndexReader and implementations. New Fieldable interface for use
-with the lazy field loading mechanism. (Grant Ingersoll and Chuck
-Williams via Grant Ingersoll)
LUCENE-678: Added NativeFSLockFactory, which implements locking
-using OS native locking (via java.nio.*). (Michael McCandless via
-Yonik Seeley)
-
LUCENE-544: Added the ability to specify different boosts for
-different fields when using MultiFieldQueryParser (Matt Ericson
-via Otis Gospodnetic)
-
LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
-optimize the index when adding new segments, only performing
-merges as needed. (Ning Li via Yonik Seeley)
-
LUCENE-573: QueryParser now allows backslash escaping in
-quoted terms and phrases. (Michael Busch via Yonik Seeley)
-
LUCENE-716: QueryParser now allows specification of Unicode
-characters in terms via a unicode escape of the form \uXXXX (Michael Busch via Yonik Seeley)
-
LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
-and IndexWriter.flushRamSegments(), allowing applications to
-control the amount of memory used to buffer documents. (Chuck Williams via Yonik Seeley)
-
LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery (Yonik Seeley)
-
LUCENE-741: Command-line utility for modifying or removing norms
-on fields in an existing index. This is mostly based on LUCENE-496
-and lives in contrib/miscellaneous. (Chris Hostetter, Otis Gospodnetic)
-
LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
-their passing unit tests. (Otis Gospodnetic)
-
LUCENE-565: Added methods to IndexWriter to more efficiently
-handle updating documents (the "delete then add" use case). This
-is intended to be an eventual replacement for the existing
-IndexModifier. Added IndexWriter.flush() (renamed from
-flushRamSegments()) to flush all pending updates (held in RAM), to
-the Directory. (Ning Li via Mike McCandless)
-
LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
-which allow one to retrieve the size of a field without retrieving the
-actual field. (Chuck Williams via Grant Ingersoll)
-
LUCENE-799: Properly handle lazy, compressed fields. (Mike Klaas via Grant Ingersoll)
LUCENE-438: Remove "final" from Token, implement Cloneable, allow
-changing of termText via setTermText(). (Yonik Seeley)
-
org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
-and is supposed to be replaced with the WordlistLoader class in
-package org.apache.lucene.analysis (Daniel Naber)
-
LUCENE-609: Revert return type of Document.getField(s) to Field
-for backward compatibility, added new Document.getFieldable(s)
-for access to new lazy loaded fields. (Yonik Seeley)
-
LUCENE-608: Document.fields() has been deprecated and a new method
-Document.getFields() has been added that returns a List instead of
-an Enumeration (Daniel Naber)
-
LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
-subclass allows explain methods to produce Explanations which model
-"matching" independent of having a positive value. (Chris Hostetter)
-
LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
-and IndexWriter.setDefaultCommitLockTimeout for overriding default
-timeout values for all future instances of IndexWriter (as well
-as for any other classes that may reference the static values,
-ie: IndexReader). (Michael McCandless via Chris Hostetter)
-
LUCENE-638: FSDirectory.list() now only returns the directory's
-Lucene-related files. Thanks to this change one can now construct
-a RAMDirectory from a file system directory that contains files
-not related to Lucene. (Simon Willnauer via Daniel Naber)
-
LUCENE-635: Decoupling locking implementation from Directory
-implementation. Added set/getLockFactory to Directory and moved
-all locking code into subclasses of abstract class LockFactory.
-FSDirectory and RAMDirectory still default to their prior locking
-implementations, but now you can mix & match, for example using
-SingleInstanceLockFactory (ie, in memory locking) locking with an
-FSDirectory. Note that now you must call setDisableLocks before
-the instantiation a FSDirectory if you wish to disable locking
-for that Directory. (Michael McCandless, Jeff Patterson via Yonik Seeley)
-
LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected. (Steven Parkes via Otis Gospodnetic)
-
LUCENE-701: Lockless commits: a commit lock is no longer required
-when a writer commits and a reader opens the index. This includes
-a change to the index file format (see docs/fileformats.html for
-details). It also removes all APIs associated with the commit
-lock & its timeout. Readers are now truly read-only and do not
-block one another on startup. This is the first step to getting
-Lucene to work correctly over NFS (second step is
-LUCENE-710). (Mike McCandless)
-
LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
-in Similarity's MoreLikeThis class. The misspelling has been
-replaced by the correct spelling. (Andi Vajda via Daniel Naber)
-
LUCENE-738: Reduce the size of the file that keeps track of which
-documents are deleted when the number of deleted documents is
-small. This changes the index file format and cannot be
-read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
-
LUCENE-756: Maintain all norms in a single .nrm file to reduce the
-number of open files and file descriptors for the non-compound index
-format. This changes the index file format, but maintains the
-ability to read and update older indices. The first segment merge
-on an older format index will create a single .nrm file for the new
-segment. (Doron Cohen via Yonik Seeley)
-
LUCENE-732: DateTools support has been added to QueryParser, with
-setters for both the default Resolution, and per-field Resolution.
-For backwards compatibility, DateField is still used if no Resolutions
-are specified. (Michael Busch via Chris Hostetter)
-
Added isOptimized() method to IndexReader. (Otis Gospodnetic)
-
LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
-take a boolean "create" argument. Instead you should use
-IndexWriter's "create" argument to create a new index. (Mike McCandless)
-
LUCENE-780: Add a static Directory.copy() method to copy files
-from one Directory to another. (Jiri Kuhn via Mike McCandless)
-
LUCENE-773: Added Directory.clearLock(String name) to forcefully
-remove an old lock. The default implementation is to ask the
-lockFactory (if non null) to clear the lock. (Mike McCandless)
-
LUCENE-795: Directory.renameFile() has been deprecated as it is
-not used anymore inside Lucene. (Daniel Naber)
Fixed the web application demo (built with "ant war-demo") which
-didn't work because it used a QueryParser method that had
-been removed (Daniel Naber)
-
LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement (Yonik Seeley)
-
LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar (Karl Wettin via Yonik Seeley)
-
LUCENE-587: Explanation.toHtml was producing malformed HTML (Chris Hostetter)
-
Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
-
LUCENE-601: RAMDirectory and RAMFile made Serializable (Karl Wettin via Otis Gospodnetic)
-
LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
-Explanations match up with the real scores. (Chris Hostetter)
-
LUCENE-607: ParallelReader's TermEnum fails to advance properly to
-new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
-
LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
-disambiguate inner class scorer's use of doc() in BooleanScorer2,
-other test code changes. (DM Smith via Yonik Seeley)
-
LUCENE-451: All core query types now use ComplexExplanations so that
-boosts of zero don't confuse the BooleanWeight explain method. (Chris Hostetter)
LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength() (Daniel Naber)
-
LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
-to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
-
LUCENE-650: Fixed NPE in Locale specific String Sort when Document
-has no value. (Oliver Hutchison via Chris Hostetter)
-
LUCENE-683: Fixed data corruption when reading lazy loaded fields. (Yonik Seeley)
-
LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
-lock to be shared between different directories. (Michael McCandless via Yonik Seeley)
-
LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields. (Yonik Seeley)
-
LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
-called on it before next(). (Yonik Seeley)
-
LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
-to recognize ordered spans if they overlapped with unordered spans. (Paul Elschot via Chris Hostetter)
-
LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
-in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
-
LUCENE-715: Fixed private constructor in IndexWriter.java to
-properly release the acquired write lock if there is an
-IOException after acquiring the write lock but before finishing
-instantiation. (Matthew Bogosian via Mike McCandless)
-
LUCENE-651: Multiple different threads requesting the same
-FieldCache entry (often for Sorting by a field) at the same
-time caused multiple generations of that entry, which was
-detrimental to performance and memory use. (Oliver Hutchison via Otis Gospodnetic)
-
LUCENE-717: Fixed build.xml not to fail when there is no lib dir. (Doron Cohen via Otis Gospodnetic)
-
LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
-classes from contrib/similarity, as their new home is under
-contrib/queries. (Otis Gospodnetic)
-
LUCENE-669: Do not double-close the RandomAccessFile in
-FSIndexInput/Output during finalize(). Besides sending an
-IOException up to the GC, this may also be the cause intermittent
-"The handle is invalid" IOExceptions on Windows when trying to
-close readers or writers. (Michael Busch via Mike McCandless)
-
LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
-on any exceptions (eg disk full). The semantics of these methods
-is now transactional: either all indices are merged or none are.
-Also fixed IndexWriter.mergeSegments (called outside of
-addIndexes(*) by addDocument, optimize, flushRamSegments) and
-IndexReader.commit() (called by close) to clean up and keep the
-instance state consistent to what's actually in the index (Mike
-McCandless).
-
-
LUCENE-129: Change finalizers to do "try {...} finally
-{super.finalize();}" to make sure we don't miss finalizers in
-classes above us. (Esmond Pitt via Mike McCandless)
-
LUCENE-754: Fix a problem introduced by LUCENE-651, causing
-IndexReaders to hang around forever, in addition to not
-fixing the original FieldCache performance problem. (Chris Hostetter, Yonik Seeley)
-
LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
-correctly raise ArrayIndexOutOfBoundsException when docNum is too
-large. Previously, if docNum was only slightly too large (within
-the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
-exception would be raised and instead the index would become
-silently corrupted. The corruption then only appears much later,
-in mergeSegments, when the corrupted segment is merged with
-segment(s) after it. (Mike McCandless)
-
LUCENE-768: Fix case where an Exception during deleteDocument,
-undeleteAll or setNorm in IndexReader could leave the reader in a
-state where close() fails to release the write lock. (Mike McCandless)
-
Remove "tvp" from known index file extensions because it is
-never used. (Nicolas Lalevée via Bernhard Messer)
-
LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
-rely on file length check and instead use the SegmentInfo's
-docCount that's already stored explicitly in the index. This is a
-defensive bug fix (ie, there is no known problem seen "in real
-life" due to this, just a possible future problem). (Chuck
-Williams via Mike McCandless)
LUCENE-586: TermDocs.skipTo() is now more efficient for
-multi-segment indexes. This will improve the performance of many
-types of queries against a non-optimized index. (Andrew Hudson
-via Yonik Seeley)
-
LUCENE-623: RAMDirectory.close now nulls out its reference to all
-internal "files", allowing them to be GCed even if references to the
-RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
-
LUCENE-629: Compressed fields are no longer uncompressed and
-recompressed during segment merges (e.g. during indexing or
-optimizing), thus improving performance . (Michael Busch via Otis
-Gospodnetic)
-
LUCENE-388: Improve indexing performance when maxBufferedDocs is
-large by keeping a count of buffered documents rather than
-counting after each document addition. (Doron Cohen, Paul Smith,
-Yonik Seeley)
-
Modified TermScorer.explain to use TermDocs.skipTo() instead of
-looping through docs. (Grant Ingersoll)
-
LUCENE-672: New indexing segment merge policy flushes all
-buffered docs to their own segment and delays a merge until
-mergeFactor segments of a certain level have been accumulated.
-This increases indexing performance in the presence of deleted
-docs or partially full segments as well as enabling future
-optimizations.
-
-NOTE: this also fixes an "under-merging" bug whereby it is
-possible to get far too many segments in your index (which will
-drastically slow down search, risks exhausting file descriptor
-limit, etc.). This can happen when the number of buffered docs
-at close, plus the number of docs in the last non-ram segment is
-greater than mergeFactor. (Ning Li, Yonik Seeley)
-
Lazy loaded fields unnecessarily retained an extra copy of loaded
-String data. (Yonik Seeley)
-
LUCENE-443: ConjunctionScorer performance increase. Speed up
-any BooleanQuery with more than one mandatory clause. (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
-
LUCENE-365: DisjunctionSumScorer performance increase of
-~30%. Speeds up queries with optional clauses. (Paul Elschot via
-Yonik Seeley)
-
LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
-size buffers, which will speed up merging and retrieving binary
-and compressed fields. (Nadav Har'El via Yonik Seeley)
-
LUCENE-687: Lazy skipping on proximity file speeds up most
-queries involving term positions, including phrase queries. (Michael Busch via Yonik Seeley)
-
LUCENE-714: Replaced 2 cases of manual for-loop array copying
-with calls to System.arraycopy instead, in DocumentWriter.java. (Nicolas Lalevee via Mike McCandless)
-
LUCENE-729: Non-recursive skipTo and next implementation of
-TermDocs for a MultiReader. The old implementation could
-recurse up to the number of segments in the index. (Yonik Seeley)
-
LUCENE-739: Improve segment merging performance by reusing
-the norm array across different fields and doing bulk writes
-of norms of segments with no deleted docs. (Michael Busch via Yonik Seeley)
-
LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
-to the List of clauses and replaced the internal synchronized Vector
-with an unsynchronized List. (Yonik Seeley)
-
LUCENE-750: Remove finalizers from FSIndexOutput and move the
-FSIndexInput finalizer to the actual file so all clones don't
-register a new finalizer. (Yonik Seeley)
Added style sheet to xdocs named lucene.css and included in the
-Anakia VSL descriptor. (Grant Ingersoll)
-
Added scoring.xml document into xdocs. Updated Similarity.java
-scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
-Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
-Issue 664.
-
-
Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
-
Moved xdocs directory to src/site/src/documentation/content/xdocs per
-Issue 707. Site now builds using Forrest, just like the other Lucene
-siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
-for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
-Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
-
Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
-
LUCENE-713 Updated the Term Vector section of File Formats to include
-documentation on how Offset and Position info are stored in the TVF file. (Grant Ingersoll, Samir Abdou)
-
Added in link to Clover Test Code Coverage Reports under the Develop
-section in Resources (Grant Ingersoll)
-
LUCENE-748: Added details for semantics of IndexWriter.close on
-hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
-
Added some text about what is contained in releases. (Eric Haszlakiewicz via Grant Ingersoll)
-
LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
-makes a full copy of the starting Directory. (Mike McCandless)
-
LUCENE-764: Fix javadocs to detail temporary space requirements
-for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
-methods. (Mike McCandless)
Added in clover test code coverage per LUCENE-721
-To enable clover code coverage, you must have clover.jar in the ANT
-classpath and specify -Drun.clover=true on the command line. (Michael Busch and Grant Ingersoll)
-
Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
-${build.dir}/test just like the tempDir sysproperty.
-
-
LUCENE-757 Added new target named init-dist that does setup for
-distribution of both binary and source distributions. Called by package
-and package-*-src
-
All deprecated methods and fields have been removed, except
-DateField, which will still be supported for some time
-so Lucene can read its date fields from old indexes (Yonik Seeley & Grant Ingersoll)
-
DisjunctionSumScorer is no longer public. (Paul Elschot via Otis Gospodnetic)
-
Creating a Field with both an empty name and an empty value
-now throws an IllegalArgumentException (Daniel Naber)
-
LUCENE-301: Added new IndexWriter({String,File,Directory},
-Analyzer) constructors that do not take a boolean "create"
-argument. These new constructors will create a new index if
-necessary, else append to the existing one. (Dan Armbrust via
-Mike McCandless)
LUCENE-330: Fix issue of FilteredQuery not working properly within
-BooleanQuery. (Paul Elschot via Erik Hatcher)
-
LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
-with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
-
Added methods to get/set writeLockTimeout and commitLockTimeout in
-IndexWriter. These could be set in Lucene 1.4 using a system property.
-This feature had been removed without adding the corresponding
-getter/setter methods. (Daniel Naber)
-
LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
-when using SpanQueries. (Paul Elschot via Yonik Seeley)
-
Implemented FilterIndexReader.getVersion() and isCurrent() (Yonik Seeley)
-
LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
-that sometimes caused the index order of documents to change. (Yonik Seeley)
-
LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
-subsequent String sorts with different locales to sort identically. (Paul Cowan via Yonik Seeley)
-
LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery (Stefan Will via Yonik Seeley)
-
LUCENE-514: Added getTermArrays() and extractTerms() to
-MultiPhraseQuery (Eric Jain & Yonik Seeley)
-
LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors (frederic via Yonik)
-
LUCENE-352: Fixed bug in SpanNotQuery that manifested as
-NullPointerException when "exclude" query was not a SpanTermQuery. (Chris Hostetter)
-
LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause (Chris Hostetter)
-
LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
-didn't know about the field yet, reader didn't keep track if it had deletions,
-and deleteDocument calls could circumvent synchronization on the subreaders. (Chuck Williams via Yonik Seeley)
-
LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
-ConstantScoreQuery in order to allow their use with a MultiSearcher. (Yonik Seeley)
-
LUCENE-546: Removed 2GB file size limitations for RAMDirectory. (Peter Royal, Michael Chan, Yonik Seeley)
-
LUCENE-485: Don't hold commit lock while removing obsolete index
-files. (Luc Vanlerberghe via cutting)
Note that this release is mostly but not 100% source compatible with
-the previous release of Lucene (1.4.3). In other words, you should
-make sure your application compiles with this version of Lucene before
-you replace the old Lucene JAR with the new one. Many methods have
-been deprecated in anticipation of release 2.0, so deprecation
-warnings are to be expected when upgrading from 1.4.3 to 1.9.
-
-
-
-
The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
-effects on indexing performance and has thus been reverted. The
-argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
-an exception is thrown. (Daniel Naber)
-
-
-
Optimized BufferedIndexOutput.writeBytes() to use
-System.arraycopy() in more cases, rather than copying byte-by-byte. (Lukas Zapletal via Cutting)
FuzzyQuery can no longer throw a TooManyClauses exception. If a
-FuzzyQuery expands to more than BooleanQuery.maxClauseCount
-terms only the BooleanQuery.maxClauseCount most similar terms
-go into the rewritten query and thus the exception is avoided. (Christoph)
-
Changed system property from "org.apache.lucene.lockdir" to
-"org.apache.lucene.lockDir", so that its casing follows the existing
-pattern used in other Lucene system properties. (Bernhard)
-
The terms of RangeQueries and FuzzyQueries are now converted to
-lowercase by default (as it has been the case for PrefixQueries
-and WildcardQueries before). Use setLowercaseExpandedTerms(false)
-to disable that behavior but note that this also affects
-PrefixQueries and WildcardQueries. (Daniel Naber)
-
Document frequency that is computed when MultiSearcher is used is now
-computed correctly and "globally" across subsearchers and indices, while
-before it used to be computed locally to each index, which caused
-ranking across multiple indices not to be equivalent. (Chuck Williams, Wolf Siberski via Otis, bug #31841 [LUCENE-295])
-
When opening an IndexWriter with create=true, Lucene now only deletes
-its own files from the index directory (looking at the file name suffixes
-to decide if a file belongs to Lucene). The old behavior was to delete
-all files. (Daniel Naber and Bernhard Messer, bug #34695 [LUCENE-385])
-
The version of an IndexReader, as returned by getCurrentVersion()
-and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
-is now initialized by the system time in milliseconds. (Bernhard Messer via Daniel Naber)
-
Several default values cannot be set via system properties anymore, as
-this has been considered inappropriate for a library like Lucene. For
-most properties there are set/get methods available in IndexWriter which
-you should use instead. This affects the following properties:
-See IndexWriter for getter/setter methods:
- org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
- org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
- org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
- org.apache.lucene.mergeFactor,
-See BooleanQuery for getter/setter methods:
- org.apache.lucene.maxClauseCount
-See FSDirectory for getter/setter methods:
- disableLuceneLocks (Daniel Naber)
-
Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
-instead of using Integer and Float classes for parsing. (Yonik Seeley via Otis Gospodnetic)
-
Expert level search routines returning TopDocs and TopFieldDocs
-no longer normalize scores. This also fixes bugs related to
-MultiSearchers and score sorting/normalization. (Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
Added support for stored compressed fields (patch #31149 [LUCENE-274]) (Bernhard Messer via Christoph)
-
Added support for binary stored fields (patch #29370 [LUCENE-229]) (Drew Farris and Bernhard Messer via Christoph)
-
Added support for position and offset information in term vectors
-(patch #18927 [LUCENE-95]). (Grant Ingersoll & Christoph)
-
A new class DateTools has been added. It allows you to format dates
-in a readable format adequate for indexing. Unlike the existing
-DateField class DateTools can cope with dates before 1970 and it
-forces you to specify the desired date resolution (e.g. month, day,
-second, ...) which can make RangeQuerys on those fields more efficient. (Daniel Naber)
-
QueryParser now correctly works with Analyzers that can return more
-than one token per position. For example, a query "+fast +car"
-would be parsed as "+fast +(car automobile)" if the Analyzer
-returns "car" and "automobile" at the same position whenever it
-finds "car" (Patch #23307 [LUCENE-133]). (Pierrick Brihaye, Daniel Naber)
-
Permit unbuffered Directory implementations (e.g., using mmap).
-InputStream is replaced by the new classes IndexInput and
-BufferedIndexInput. OutputStream is replaced by the new classes
-IndexOutput and BufferedIndexOutput. InputStream and OutputStream
-are now deprecated and FSDirectory is now subclassable. (cutting)
-
Add native Directory and TermDocs implementations that work under
-GCJ. These require GCC 3.4.0 or later and have only been tested
-on Linux. Use 'ant gcj' to build demo applications. (cutting)
-
Add MMapDirectory, which uses nio to mmap input files. This is
-still somewhat slower than FSDirectory. However it uses less
-memory per query term, since a new buffer is not allocated per
-term, which may help applications which use, e.g., wildcard
-queries. It may also someday be faster. (cutting & Paul Elschot)
-
Added javadocs-internal to build.xml - bug #30360 [LUCENE-250] (Paul Elschot via Otis)
-
Added RangeFilter, a more generically useful filter than DateFilter. (Chris M Hostetter via Erik)
-
Added NumberTools, a utility class indexing numeric fields. (adapted from code contributed by Matt Quail; committed by Erik)
-
Added public static IndexReader.main(String[] args) method.
-IndexReader can now be used directly at command line level
-to list and optionally extract the individual files from an existing
-compound index file. (adapted from code contributed by Garrett Rooney; committed by Bernhard)
-
Add IndexWriter.setTermIndexInterval() method. See javadocs. (Doug Cutting)
-
Added LucenePackage, whose static get() method returns java.util.Package,
-which lets the caller get the Lucene version information specified in
-the Lucene Jar. (Doug Cutting via Otis)
-
Added Hits.iterator() method and corresponding HitIterator and Hit objects.
-This provides standard java.util.Iterator iteration over Hits.
-Each call to the iterator's next() method returns a Hit object. (Jeremy Rayner via Erik)
-
Add ParallelReader, an IndexReader that combines separate indexes
-over different fields into a single virtual index. (Doug Cutting)
-
Add IntParser and FloatParser interfaces to FieldCache, so that
-fields in arbitrarily formats can be cached as ints and floats. (Doug Cutting)
-
Added class org.apache.lucene.index.IndexModifier which combines
-IndexWriter and IndexReader, so you can add and delete documents without
-worrying about synchronization/locking issues. (Daniel Naber)
-
Lucene can now be used inside an unsigned applet, as Lucene's access
-to system properties will not cause a SecurityException anymore. (Jon Schuster via Daniel Naber, bug #34359 [LUCENE-369])
-
Added a new class MatchAllDocsQuery that matches all documents. (John Wang via Daniel Naber, bug #34946 [LUCENE-389])
-
Added ability to omit norms on a per field basis to decrease
-index size and memory consumption when there are many indexed fields.
-See Field.setOmitNorms() (Yonik Seeley, LUCENE-448)
-
Added NullFragmenter to contrib/highlighter, which is useful for
-highlighting entire documents or fields. (Erik Hatcher)
-
Added regular expression queries, RegexQuery and SpanRegexQuery.
-Note the same term enumeration caveats apply with these queries as
-apply to WildcardQuery and other term expanding queries.
-These two new queries are not currently supported via QueryParser. (Erik Hatcher)
-
Added ConstantScoreQuery which wraps a filter and produces a score
-equal to the query boost for every matching document. (Yonik Seeley, LUCENE-383)
-
Added ConstantScoreRangeQuery which produces a constant score for
-every document in the range. One advantage over a normal RangeQuery
-is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
-number of terms the range can cover. Both endpoints may also be open. (Yonik Seeley, LUCENE-383)
-
Added ability to specify a minimum number of optional clauses that
-must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch(). (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
-
Added DisjunctionMaxQuery which provides the maximum score across its clauses.
-It's very useful for searching across multiple fields. (Chuck Williams via Yonik Seeley, LUCENE-323)
-
New class ISOLatin1AccentFilter that replaces accented characters in the ISO
-Latin 1 character set by their unaccented equivalent. (Sven Duzont via Erik Hatcher)
-
New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
-This is useful for data like zip codes, ids, and some product names. (Erik Hatcher)
-
Copied LengthFilter from contrib area to core. Removes words that are too
-long and too short from the stream. (David Spencer via Otis and Daniel)
-
Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
-custom analyzers to put gaps between Field instances with the same field
-name, preventing phrase or span queries crossing these boundaries. The
-default implementation issues a gap of 0, allowing the default token
-position increment of 1 to put the next field's first token into a
-successive position. (Erik Hatcher, with advice from Yonik)
-
StopFilter can now ignore case when checking for stop words. (Grant Ingersoll via Yonik, LUCENE-248)
-
Add TopDocCollector and TopFieldDocCollector. These simplify the
-implementation of hit collectors that collect only the
-top-scoring or top-sorting hits.
-
Several methods and fields have been deprecated. The API documentation
-contains information about the recommended replacements. It is planned
-that most of the deprecated methods and fields will be removed in
-Lucene 2.0. (Daniel Naber)
-
The Russian and the German analyzers have been moved to contrib/analyzers.
-Also, the WordlistLoader class has been moved one level up in the
-hierarchy and is now org.apache.lucene.analysis.WordlistLoader (Daniel Naber)
-
The API contained methods that declared to throw an IOException
-but that never did this. These declarations have been removed. If
-your code tries to catch these exceptions you might need to remove
-those catch clauses to avoid compile errors. (Daniel Naber)
-
Add a serializable Parameter Class to standardize parameter enum
-classes in BooleanClause and Field. (Christoph)
-
Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
-This allows custom SpanQuery subclasses that rewrite (for term expansion, for
-example) to nest within the built-in SpanQuery classes successfully.
-
The JSP demo page (src/jsp/results.jsp) now properly closes the
-IndexSearcher it opens. (Daniel Naber)
-
Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
-prevented deletion of obsolete segments. (Christoph Goller)
-
Fix in FieldInfos to avoid the return of an extra blank field in
-IndexReader.getFieldNames() (Patch #19058 [LUCENE-102]). (Mark Harwood via Bernhard)
-
Some combinations of BooleanQuery and MultiPhraseQuery (formerly
-PhrasePrefixQuery) could provoke UnsupportedOperationException
-(bug #33161 [LUCENE-337]). (Rhett Sutphin via Daniel Naber)
-
Small bug in skipTo of ConjunctionScorer that caused NullPointerException
-if skipTo() was called without prior call to next() fixed. (Christoph)
-
Disable Similiarty.coord() in the scoring of most automatically
-generated boolean queries. The coord() score factor is
-appropriate when clauses are independently specified by a user,
-but is usually not appropriate when clauses are generated
-automatically, e.g., by a fuzzy, wildcard or range query. Matches
-on such automatically generated queries are no longer penalized
-for not matching all terms. (Doug Cutting, Patch #33472 [LUCENE-346])
-
Getting a lock file with Lock.obtain(long) was supposed to wait for
-a given amount of milliseconds, but this didn't work. (John Wang via Daniel Naber, Bug #33799 [LUCENE-353])
-
Fix FSDirectory.createOutput() to always create new files.
-Previously, existing files were overwritten, and an index could be
-corrupted when the old version of a file was longer than the new.
-Now any existing file is first removed. (Doug Cutting)
-
Fix BooleanQuery containing nested SpanTermQuery's, which previously
-could return an incorrect number of hits. (Reece Wilton via Erik Hatcher, Bug #35157 [LUCENE-393])
-
Fix NullPointerException that could occur with a MultiPhraseQuery
-inside a BooleanQuery. (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626 [LUCENE-404])
-
Fixed SnowballFilter to pass through the position increment from
-the original token. (Yonik Seeley via Erik Hatcher, LUCENE-437)
-
Added Unicode range of Korean characters to StandardTokenizer,
-grouping contiguous characters into a token rather than one token
-per character. This change also changes the token type to "<CJ>"
-for Chinese and Japanese character tokens (previously it was "<CJK>"). (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
-
FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
-FieldInfo.storePositionWithTermVector and creates the Field with
-correct TermVector parameter. (Frank Steinmann via Bernhard, LUCENE-455)
-
Fixed WildcardQuery to prevent "cat" matching "ca??". (Xiaozheng Ma via Bernhard, LUCENE-306)
-
Fixed a bug where MultiSearcher and ParallelMultiSearcher could
-change the sort order when sorting by string for documents without
-a value for the sort field. (Luc Vanlerberghe via Yonik, LUCENE-453)
-
Fixed a sorting problem with MultiSearchers that can lead to
-missing or duplicate docs due to equal docs sorting in an arbitrary order. (Yonik Seeley, LUCENE-456)
-
A single hit using the expert level sorted search methods
-resulted in the score not being normalized. (Yonik Seeley, LUCENE-462)
-
Fixed inefficient memory usage when loading an index into RAMDirectory. (Volodymyr Bychkoviak via Bernhard, LUCENE-475)
-
Corrected term offsets returned by ChineseTokenizer. (Ray Tsang via Erik Hatcher, LUCENE-324)
-
Fixed MultiReader.undeleteAll() to correctly update numDocs. (Robert Kirchgessner via Doug Cutting, LUCENE-479)
-
Race condition in IndexReader.getCurrentVersion() and isCurrent()
-fixed by acquiring the commit lock. (Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
-
IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
-this has now been fixed. (Daniel Naber)
-
Fixed QueryParser when called with a date in local form like
-"[1/16/2000 TO 1/18/2000]". This query did not include the documents
-of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
-
Removed sorting constraint that threw an exception if there were
-not yet any values for the sort field (Yonik Seeley, LUCENE-374)
Disk usage (peak requirements during indexing and optimization)
-in case of compound file format has been improved. (Bernhard, Dmitry, and Christoph)
-
Optimize the performance of certain uses of BooleanScorer,
-TermScorer and IndexSearcher. In particular, a BooleanQuery
-composed of TermQuery, with not all terms required, that returns a
-TopDocs (e.g., through a Hits with no Sort specified) runs much
-faster. (cutting)
-
Removed synchronization from reading of term vectors with an
-IndexReader (Patch #30736 [LUCENE-265]). (Bernhard Messer via Christoph)
-
Optimize term-dictionary lookup to allocate far fewer terms when
-scanning for the matching term. This speeds searches involving
-low-frequency terms, where the cost of dictionary lookup can be
-significant. (cutting)
-
Optimize fuzzy queries so the standard fuzzy queries with a prefix
-of 0 now run 20-50% faster (Patch #31882 [LUCENE-296]). (Jonathan Hager via Daniel Naber)
-
A Version of BooleanScorer (BooleanScorer2) added that delivers
-documents in increasing order and implements skipTo. For queries
-with required or forbidden clauses it may be faster than the old
-BooleanScorer, for BooleanQueries consisting only of optional
-clauses it is probably slower. The new BooleanScorer is now the
-default. (Patch 31785 [LUCENE-294] by Paul Elschot via Christoph)
-
Use uncached access to norms when merging to reduce RAM usage.
-(Bug #32847 [LUCENE-326]). (Doug Cutting)
-
Don't read term index when random-access is not required. This
-reduces time to open IndexReaders and they use less memory when
-random access is not required, e.g., when merging segments. The
-term index is now read into memory lazily at the first
-random-access. (Doug Cutting)
-
Optimize IndexWriter.addIndexes(Directory[]) when the number of
-added indexes is larger than mergeFactor. Previously this could
-result in quadratic performance. Now performance is n log(n). (Doug Cutting)
-
Speed up the creation of TermEnum for indices with multiple
-segments and deleted documents, and thus speed up PrefixQuery,
-RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
-and sorting the first time on a field. (Yonik Seeley, LUCENE-454)
-
Optimized and generalized 32 bit floating point to byte
-(custom 8 bit floating point) conversions. Increased the speed of
-Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM. (Yonik Seeley, LUCENE-467)
Lucene's source code repository has converted from CVS to
-Subversion. The new repository is at
-http://svn.apache.org/repos/asf/lucene/java/trunk
-
-
Lucene's issue tracker has migrated from Bugzilla to JIRA.
-Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
-The old issues are still available at
-http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx (use the bug number instead of xxxx)
The JSP demo page (src/jsp/results.jsp) now properly escapes error
-messages which might contain user input (e.g. error messages about
-query parsing). If you used that page as a starting point for your
-own code please make sure your code also properly escapes HTML
-characters from user input in order to avoid so-called cross site
-scripting attacks. (Daniel Naber)
-
QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
-API is supported again. (Christoph)
Fixed bug #31241 [LUCENE-277]: Sorting could lead to incorrect results (documents
-missing, others duplicated) if the sort keys were not unique and there
-were more than 100 matches. (Daniel Naber)
-
Memory leak in Sort code (bug #31240 [LUCENE-276]) eliminated. (Rafal Krzewski via Christoph and Daniel)
-
FuzzyQuery now takes an additional parameter that specifies the
-minimum similarity that is required for a term to match the query.
-The QueryParser syntax for this is term~x, where x is a floating
-point number >= 0 and < 1 (a bigger number means that a higher
-similarity is required). Furthermore, a prefix can be specified
-for FuzzyQuerys so that only those terms are considered similar that
-start with this prefix. This can speed up FuzzyQuery greatly. (Daniel Naber, Christoph Goller)
-
PhraseQuery and PhrasePrefixQuery now allow the explicit specification
-of relative positions. (Christoph Goller)
-
QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
-(patch #9110 [LUCENE-35]); some unused method parameters removed; The ability
-to specify a minimum similarity for FuzzyQuery has been added. (Christoph Goller)
-
IndexSearcher optimization: a new ScoreDoc is no longer allocated
-for every non-zero-scoring hit. This makes 'OR' queries that
-contain common terms substantially faster. (cutting)
Added "an" to the list of stop words in StopAnalyzer, to complement
-the existing "a" there. Fix for bug 28960 [LUCENE-132]
- (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
-
Added new class FieldCache to manage in-memory caches of field term
-values. (Tim Jones)
-
Added overloaded getFieldQuery method to QueryParser which
-accepts the slop factor specified for the phrase (or the default
-phrase slop for the QueryParser instance). This allows overriding
-methods to replace a PhraseQuery with a SpanNearQuery instead,
-keeping the proper slop factor. (Erik Hatcher)
-
Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
-UTF-8 and changed the build encoding to UTF-8, to make changed files
-compile. (Otis Gospodnetic)
-
Removed synchronization from term lookup under IndexReader methods
-termFreq(), termDocs() or termPositions() to improve
-multi-threaded performance. (cutting)
-
Fix a bug where obsolete segment files were not deleted on Win32.
-
Fixed several search bugs introduced by the skipTo() changes in
-release 1.4RC1. The index file format was changed a bit, so
-collections must be re-indexed to take advantage of the skipTo()
-optimizations. (Christoph Goller)
-
Added new Document methods, removeField() and removeFields(). (Christoph Goller)
-
Fixed inconsistencies with index closing. Indexes and directories
-are now only closed automatically by Lucene when Lucene opened
-them automatically. (Christoph Goller)
-
Added new class: FilteredQuery. (Tim Jones)
-
Added a new SortField type for custom comparators. (Tim Jones)
-
Lock obtain timed out message now displays the full path to the lock
-file. (Daniel Naber via Erik)
-
Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
-
Fixed so that FSDirectory's locks still work when the
-java.io.tmpdir system property is null. (cutting)
-
Changed FilteredTermEnum's constructor to take no parameters,
-as the parameters were ignored anyway (bug #28858 [LUCENE-224])
GermanAnalyzer now throws an exception if the stopword file
-cannot be found (bug #27987 [LUCENE-203]). It now uses LowerCaseFilter
-(bug #18410 [LUCENE-87]) (Daniel Naber via Otis, Erik)
-
Fixed a few bugs in the file format documentation. (cutting)
Changed the format of the .tis file, so that:
-
-- it has a format version number, which makes it easier to
- back-compatibly change file formats in the future.
-
-- the term count is now stored as a long. This was the one aspect
- of the Lucene's file formats which limited index size.
-
-- a few internal index parameters are now stored in the index, so
- that they can (in theory) now be changed from index to index,
- although there is not yet an API to do so.
-
-These changes are back compatible. The new code can read old
-indexes. But old code will not be able read new indexes. (cutting)
-
Added an optimized implementation of TermDocs.skipTo(). A skip
-table is now stored for each term in the .frq file. This only
-adds a percent or two to overall index size, but can substantially
-speedup many searches. (cutting)
-
Restructured the Scorer API and all Scorer implementations to take
-advantage of an optimized TermDocs.skipTo() implementation. In
-particular, PhraseQuerys and conjunctive BooleanQuerys are
-faster when one clause has substantially fewer matches than the
-others. (A conjunctive BooleanQuery is a BooleanQuery where all
-clauses are required.) (cutting)
-
Added new class ParallelMultiSearcher. Combined with
-RemoteSearchable this makes it easy to implement distributed
-search systems. (Jean-Francois Halleux via cutting)
-
Added support for hit sorting. Results may now be sorted by any
-indexed field. For details see the javadoc for
-Searcher#search(Query, Sort). (Tim Jones via Cutting)
-
Changed FSDirectory to auto-create a full directory tree that it
-needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
-
Added a new span-based query API. This implements, among other
-things, nested phrases. See javadocs for details. (Doug Cutting)
-
Added new method Query.getSimilarity(Searcher), and changed
-scorers to use it. This permits one to subclass a Query class so
-that it can specify its own Similarity implementation, perhaps
-one that delegates through that of the Searcher. (Julien Nioche
-via Cutting)
-
Added MultiReader, an IndexReader that combines multiple other
-IndexReaders. (Cutting)
-
Added support for term vectors. See Field#isTermVectorStored(). (Grant Ingersoll, Cutting & Dmitry)
-
Fixed the old bug with escaping of special characters in query
-strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665 (Jean-Francois Halleux via Otis)
-
Added support for overriding default values for the following,
-using system properties:
- - default commit lock timeout
- - default maxFieldLength
- - default maxMergeDocs
- - default mergeFactor
- - default minMergeDocs
- - default write lock timeout (Otis)
-
Changed QueryParser.jj to allow '-' and '+' within tokens:
-http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 (Morus Walter via Otis)
-
Changed so that the compound index format is used by default.
-This makes indexing a bit slower, but vastly reduces the chances
-of file handle problems. (Cutting)
Added catch of BooleanQuery$TooManyClauses in QueryParser to
-throw ParseException instead. (Erik Hatcher)
-
Fixed a NullPointerException in Query.explain(). (Doug Cutting)
-
Added a new method IndexReader.setNorm(), that permits one to
-alter the boosting of fields after an index is created.
-
-
Distinguish between the final position and length when indexing a
-field. The length is now defined as the total number of tokens,
-instead of the final position, as it was previously. Length is
-used for score normalization (Similarity.lengthNorm()) and for
-controlling memory usage (IndexWriter.maxFieldLength). In both of
-these cases, the total number of tokens is a better value to use
-than the final token position. Position is used in phrase
-searching (see PhraseQuery and Token.setPositionIncrement()).
-
-
Fix StandardTokenizer's handling of CJK characters (Chinese,
-Japanese and Korean ideograms). Previously contiguous sequences
-were combined in a single token, which is not very useful. Now
-each ideogram generates a separate token, which is more useful.
-
Added minMergeDocs in IndexWriter. This can be raised to speed
-indexing without altering the number of files, but only using more
-memory. (Julien Nioche via Otis)
Fix bug #16952 [LUCENE-85], in demo HTML parser, skip comments in
-javascript. (Christoph Goller)
-
Fix bug #19253 [LUCENE-105], in demo HTML parser, add whitespace as needed to
-output (Daniel Naber via Christoph Goller)
-
Fix bug #24301 [LUCENE-159], in demo HTML parser, long titles no longer
-hang things. (Christoph Goller)
-
Fix bug #23534 [LUCENE-138], Replace use of file timestamp of segments file
-with an index version number stored in the segments file. This
-resolves problems when running on file systems with low-resolution
-timestamps, e.g., HFS under MacOS X. (Christoph Goller)
-
Fix QueryParser so that TokenMgrError is not thrown, only
-ParseException. (Erik Hatcher)
-
Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
-
Fixed a problem compiling TestRussianStem. (Christoph Goller)
Added getFieldNames(boolean) to IndexReader, SegmentReader, and
-SegmentsReader. (Julien Nioche via otis)
-
Changed file locking to place lock files in
-System.getProperty("java.io.tmpdir"), where all users are
-permitted to write files. This way folks can open and correctly
-lock indexes which are read-only to them.
-
-
IndexWriter: added a new method, addDocument(Document, Analyzer),
-permitting one to easily use different analyzers for different
-documents in the same index.
-
-
Minor enhancements to FuzzyTermEnum. (Christoph Goller via Otis)
-
PriorityQueue: added insert(Object) method and adjusted IndexSearcher
-and MultiIndexSearcher to use it. (Christoph Goller via Otis)
-
Fixed a bug in IndexWriter that returned incorrect docCount(). (Christoph Goller via Otis)
-
Fixed SegmentsReader to eliminate the confusing and slightly different
-behaviour of TermEnum when dealing with an enumeration of all terms,
-versus an enumeration starting from a specific term.
-This patch also fixes incorrect term document frequencies when the same term
-is present in multiple segments. (Christoph Goller via Otis)
-
Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
-
Added support for the new "compound file" index format (Dmitry
-Serebrennikov)
-
Added Locale setting to QueryParser, for use by date range parsing.
-
-
Changed IndexReader so that it can be subclassed by classes
-outside of its package. Previously it had package-private
-abstract methods. Also modified the index merging code so that it
-can work on an arbitrary IndexReader implementation, and added a
-new method, IndexWriter.addIndexes(IndexReader[]), to take
-advantage of this. (cutting)
-
Added a limit to the number of clauses which may be added to a
-BooleanQuery. The default limit is 1024 clauses. This should
-stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
-queries which run amok. (cutting)
-
Add new method: IndexReader.undeleteAll(). This undeletes all
-deleted documents which still remain in the index. (cutting)
Fixed PriorityQueue's clear() method.
-Fix for bug 9454 [LUCENE-37], http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454 (Matthijs Bomhoff via otis)
-
Changed StandardTokenizer.jj grammar for EMAIL tokens.
-Fix for bug 9015 [LUCENE-34], http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015 (Dale Anson via otis)
-
Added the ability to disable lock creation by using disableLuceneLocks
-system property. This is useful for read-only media, such as CD-ROMs. (otis)
-
Added id method to Hits to be able to access the index global id.
-Required for sorting options. (carlson)
-
Added support for new range query syntax to QueryParser.jj. (briangoetz)
-
Added the ability to retrieve HTML documents' META tag values to
-HTMLParser.jj. (Mark Harwood via otis)
-
Modified QueryParser to make it possible to programmatically specify the
-default Boolean operator (OR or AND). (Péter Halácsy via otis)
-
Made many search methods and classes non-final, per requests.
-This includes IndexWriter and IndexSearcher, among others. (cutting)
-
Added class RemoteSearchable, providing support for remote
-searching via RMI. The test class RemoteSearchableTest.java
-provides an example of how this can be used. (cutting)
-
Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
-test class TestPhrasePrefixQuery provides the usage example. (Anders Nielsen via otis)
-
Changed the German stemming algorithm to ignore case while
-stripping. The new algorithm is faster and produces more equal
-stems from nouns and verbs derived from the same word. (gschwarz)
-
Added support for boosting the score of documents and fields via
-the new methods Document.setBoost(float) and Field.setBoost(float).
-
-Note: This changes the encoding of an indexed value. Indexes
-should be re-created from scratch in order for search scores to
-be correct. With the new code and an old index, searches will
-yield very large scores for shorter fields, and very small scores
-for longer fields. Once the index is re-created, scores will be
-as before. (cutting)
-
Added new method Token.setPositionIncrement().
-
-This permits, for the purpose of phrase searching, placing
-multiple terms in a single position. This is useful with
-stemmers that produce multiple possible stems for a word.
-
-This also permits the introduction of gaps between terms, so that
-terms which are adjacent in a token stream will not be matched by
-and exact phrase query. This makes it possible, e.g., to build
-an analyzer where phrases are not matched over stop words which
-have been removed.
-
-Finally, repeating a token with an increment of zero can also be
-used to boost scores of matches on that token. (cutting)
-
Added new Filter class, QueryFilter. This constrains search
-results to only match those which also match a provided query.
-Results are cached, so that searches after the first on the same
-index using this filter are very fast.
-
-This could be used, for example, with a RangeQuery on a formatted
-date field to implement date filtering. One could re-use a
-single QueryFilter that matches, e.g., only documents modified
-within the last week. The QueryFilter and RangeQuery would only
-need to be reconstructed once per day. (cutting)
-
Added a new IndexWriter method, getAnalyzer(). This returns the
-analyzer used when adding documents to this index. (cutting)
-
Fixed a bug with IndexReader.lastModified(). Before, document
-deletion did not update this. Now it does. (cutting)
-
Added Russian Analyzer. (Boris Okner via otis)
-
Added a public, extensible scoring API. For details, see the
-javadoc for org.apache.lucene.search.Similarity.
-
-
Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
-
-
Added getFieldNames() to IndexReader and Segment(s)Reader classes. (Peter Mularien via otis)
-
Added getFields(String) and getValues(String) methods.
-Contributed by Rasik Pandey on 2002-10-09 (Rasik Pandey via otis)
-
Revised internal search APIs. Changes include:
-
- a. Queries are no longer modified during a search. This makes
- it possible, e.g., to reuse the same query instance with
- multiple indexes from multiple threads.
-
- b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
- etc.) now work correctly with MultiSearcher, fixing bugs 12619
- and 12667.
-
- c. Boosting BooleanQuery's now works, and is supported by the
- query parser (problem reported by Lee Mallabone). Thus a query
- like "(+foo +bar)^2 +baz" is now supported and equivalent to
- "(+foo^2 +bar^2) +baz".
-
- d. New method: Query.rewrite(IndexReader). This permits a
- query to re-write itself as an alternate, more primitive query.
- Most of the term-expanding query classes (PrefixQuery,
- WildcardQuery, etc.) are now implemented using this method.
-
- e. New method: Searchable.explain(Query q, int doc). This
- returns an Explanation instance that describes how a particular
- document is scored against a query. An explanation can be
- displayed as either plain text, with the toString() method, or
- as HTML, with the toHtml() method. Note that computing an
- explanation is as expensive as executing the query over the
- entire index. This is intended to be used in developing
- Similarity implementations, and, for good performance, should
- not be displayed with every hit.
-
- f. Scorer and Weight are public, not package protected. It now
- possible for someone to write a Scorer implementation that is
- not in the org.apache.lucene.search package. This is still
- fairly advanced programming, and I don't expect anyone to do
- this anytime soon, but at least now it is possible.
-
- g. Added public accessors to the primitive query classes
- (TermQuery, PhraseQuery and BooleanQuery), permitting access to
- their terms and clauses.
-
-Caution: These are extensive changes and they have not yet been
-tested extensively. Bug reports are appreciated. (cutting)
-
Added convenience RAMDirectory constructors taking File and String
-arguments, for easy FSDirectory to RAMDirectory conversion. (otis)
-
Added code for manual renaming of files in FSDirectory, since it
-has been reported that java.io.File's renameTo(File) method sometimes
-fails on Windows JVMs. (Matt Tucker via otis)
-
Refactored QueryParser to make it easier for people to extend it.
-Added the ability to automatically lower-case Wildcard terms in
-the QueryParser. (Tatu Saloranta via otis)
Changed QueryParser.jj to have "?" be a special character which
-allowed it to be used as a wildcard term. Updated TestWildcard
-unit test also. (Ralf Hettesheimer via carlson)
Renamed build.properties to default.properties and updated
-the BUILD.txt document to describe how to override the
-default.property settings without having to edit the file. This
-brings the build process closer to Scarab's build process. (jon)
-
Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
Updated contributions section of website.
-Add XML Document #3 implementation to Document Section.
-Also added Term Highlighting to Misc Section. (carlson)
-
Fixed NullPointerException for phrase searches containing
-unindexed terms, introduced in 1.2RC3. (cutting)
-
Changed document deletion code to obtain the index write lock,
-enforcing the fact that document addition and deletion cannot be
-performed concurrently. (cutting)
-
Various documentation cleanups. (otis, acoliver)
-
Updated "powered by" links. (cutting, jon)
-
Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
-
Changed Term and Query to implement Serializable. (scottganyo)
-
Fixed to never delete indexes added with IndexWriter.addIndexes(). (cutting)
IndexWriter: fixed a bug where adding an optimized index to an
-empty index failed. This was encountered using addIndexes to copy
-a RAMDirectory index to an FSDirectory.
-
-
RAMDirectory: fixed a bug where RAMInputStream could not read
-across more than across a single buffer boundary.
-
-
Fix query parser so it accepts queries with unicode characters. (briangoetz)
-
Fix query parser so that PrefixQuery is used in preference to
-WildcardQuery when there's only an asterisk at the end of the
-term. Previously PrefixQuery would never be used.
-
-
Fix tests so they compile; fix ant file so it compiles tests
-properly. Added test cases for Analyzers and PriorityQueue.
-
-
Updated demos, added Getting Started documentation. (acoliver)
-
Added 'contributions' section to website & docs. (carlson)
-
Removed JavaCC from source distribution for copyright reasons.
-Folks must now download this separately from metamata in order to
-compile Lucene. (cutting)
-
Substantially improved the performance of DateFilter by adding the
-ability to reuse TermDocs objects. (cutting)
-
Added IndexReader methods:
- public static boolean indexExists(String directory);
- public static boolean indexExists(File directory);
- public static boolean indexExists(Directory directory);
- public static boolean isLocked(Directory directory);
- public static void unlock(Directory directory); (cutting, otis)
removed broken build scripts and libraries from distribution
-
-
SegmentsReader: fixed potential race condition
-
-
FSDirectory: fixed so that getDirectory(xxx,true) correctly
-erases the directory contents, even when the directory
-has already been accessed in this JVM.
-
-
RangeQuery: Fix issue where an inclusive range query would
-include the nearest term in the index above a non-existant
-specified upper term.
-
-
SegmentTermEnum: Fix NullPointerException in clone() method
-when the Term is null.
-
-
JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
-since they rely on a feature added in JDK 1.2.
-
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
-
-
The only tokenizer included in the previous release (LetterTokenizer)
-identified terms consisting entirely of alphabetic characters. The
-new tokenizer uses a regular-expression grammar to identify more
-complex classes of terms, including numbers, acronyms, email
-addresses, etc.
-
-
StandardTokenizer serves two purposes:
-
-
1. It is a much better, general purpose tokenizer for use by
- applications as is.
-
-
The easiest way for applications to start using
-StandardTokenizer is to use StandardAnalyzer.
-
-
2. It provides a good example of grammar-based tokenization.
-
-
If an application has special tokenization requirements, it can
-implement a custom tokenizer by copying the directory containing
-the new tokenizer into the application and modifying it
-accordingly.
-
The code has been re-organized into a new package and directory
-structure for this release. It builds OK, but has not been tested
-beyond that since the re-organization.
-
-
-
-
-
diff --git a/ChangesFancyStyle.css b/ChangesFancyStyle.css
deleted file mode 100644
index 65011432e20..00000000000
--- a/ChangesFancyStyle.css
+++ /dev/null
@@ -1,30 +0,0 @@
-body {
- font-family: Georgia, "Times New Roman", Times, serif;
- color: black;
- background-color: light-grey
-}
-
-h1 {
- font-family: Helvetica, Geneva, Arial, SunSans-Regular, sans-serif
- color: yellow;
- background-color: lightblue
-}
-
-h2 {
- font-family: Helvetica, Geneva, Arial, SunSans-Regular, sans-serif
- color: yellow;
- background-color: lightblue
-}
-
-a:link {
- color: blue
-}
-
-a:visited {
- color: purple
-}
-
-li {
- margin-top: 1em;
- margin-bottom: 1em;
-}
\ No newline at end of file
diff --git a/ChangesSimpleStyle.css b/ChangesSimpleStyle.css
deleted file mode 100644
index 0f8f7fab4f6..00000000000
--- a/ChangesSimpleStyle.css
+++ /dev/null
@@ -1,32 +0,0 @@
-body {
- font-family: Courier New, monospace;
- font-size: 10pt;
-}
-
-h1 {
- font-family: Courier New, monospace;
- font-size: 10pt;
-}
-
-h2 {
- font-family: Courier New, monospace;
- font-size: 10pt;
-}
-
-h3 {
- font-family: Courier New, monospace;
- font-size: 10pt;
-}
-
-a:link {
- color: blue;
-}
-
-a:visited {
- color: purple;
-}
-
-li {
- margin-top: 1em;
- margin-bottom: 1em;
-}
\ No newline at end of file
diff --git a/changes2html.pl b/changes2html.pl
deleted file mode 100755
index 16034b5377b..00000000000
--- a/changes2html.pl
+++ /dev/null
@@ -1,576 +0,0 @@
-#!/usr/bin/perl
-#
-# Transforms Lucene Java's CHANGES.txt into Changes.html
-#
-# Input is on STDIN, output is to STDOUT
-#
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-use strict;
-use warnings;
-
-my $jira_url_prefix = 'http://issues.apache.org/jira/browse/';
-my $bugzilla_url_prefix = 'http://issues.apache.org/bugzilla/show_bug.cgi?id=';
-my %release_dates = &setup_release_dates;
-my $month_regex = &setup_month_regex;
-my %month_nums = &setup_month_nums;
-my %bugzilla_jira_map = &setup_bugzilla_jira_map;
-my $title = undef;
-my $release = undef;
-my $reldate = undef;
-my $relinfo = undef;
-my $sections = undef;
-my $items = undef;
-my $first_relid = undef;
-my $second_relid = undef;
-my @releases = ();
-
-my @lines = <>; # Get all input at once
-
-#
-# Parse input and build hierarchical release structure in @releases
-#
-for (my $line_num = 0 ; $line_num <= $#lines ; ++$line_num) {
- $_ = $lines[$line_num];
- next unless (/\S/); # Skip blank lines
- next if (/^\s*\$Id(?::.*)?\$/); # Skip $Id$ lines
-
- unless ($title) {
- if (/\S/) {
- s/^\s+//; # Trim leading whitespace
- s/\s+$//; # Trim trailing whitespace
- }
- $title = $_;
- next;
- }
-
- if (/\s*===+\s*(.*?)\s*===+\s*/) { # New-style release headings
- $release = $1;
- $release =~ s/^release\s*//i; # Trim "Release " prefix
- ($release, $relinfo) = ($release =~ /^(\d+(?:\.\d+)*|Trunk)\s*(.*)/i);
- $relinfo =~ s/\s*:\s*$//; # Trim trailing colon
- $relinfo =~ s/^\s*,\s*//; # Trim leading comma
- ($reldate, $relinfo) = get_release_date($release, $relinfo);
- $sections = [];
- push @releases, [ $release, $reldate, $relinfo, $sections ];
- ($first_relid = lc($release)) =~ s/\s+/_/g if ($#releases == 0);
- ($second_relid = lc($release)) =~ s/\s+/_/g if ($#releases == 1);
- $items = undef;
- next;
- }
-
- if (/^\s*([01](?:\.[0-9]{1,2}){1,2}[a-z]?(?:\s*(?:RC\d+|final))?)\s*
- ((?:200[0-7]-.*|.*,.*200[0-7].*)?)$/x) { # Old-style release heading
- $release = $1;
- $relinfo = $2;
- $relinfo =~ s/\s*:\s*$//; # Trim trailing colon
- $relinfo =~ s/^\s*,\s*//; # Trim leading comma
- ($reldate, $relinfo) = get_release_date($release, $relinfo);
- $sections = [];
- push @releases, [ $release, $reldate, $relinfo, $sections ];
- $items = undef;
- next;
- }
-
- # Section heading: no leading whitespace, initial word capitalized,
- # five words or less, and no trailing punctuation
- if (/^([A-Z]\S*(?:\s+\S+){0,4})(?[0]; # 0th position of items array is list type
- } else {
- $type = get_list_type($_);
- push @$items, $type;
- }
-
- if ($type eq 'numbered') { # The modern items list style
- # List item boundary is another numbered item or an unindented line
- my $line;
- my $item = $_;
- $item =~ s/^(\s{0,2}\d+\.\s*)//; # Trim the leading item number
- my $leading_ws_width = length($1);
- $item =~ s/\s+$//; # Trim trailing whitespace
- $item .= "\n";
-
- while ($line_num < $#lines
- and ($line = $lines[++$line_num]) !~ /^(?:\s{0,2}\d+\.\s*\S|\S)/) {
- $line =~ s/^\s{$leading_ws_width}//; # Trim leading whitespace
- $line =~ s/\s+$//; # Trim trailing whitespace
- $item .= "$line\n";
- }
- $item =~ s/\n+\Z/\n/; # Trim trailing blank lines
- push @$items, $item;
- --$line_num unless ($line_num == $#lines);
- } elsif ($type eq 'paragraph') { # List item boundary is a blank line
- my $line;
- my $item = $_;
- $item =~ s/^(\s+)//;
- my $leading_ws_width = defined($1) ? length($1) : 0;
- $item =~ s/\s+$//; # Trim trailing whitespace
- $item .= "\n";
-
- while ($line_num < $#lines and ($line = $lines[++$line_num]) =~ /\S/) {
- $line =~ s/^\s{$leading_ws_width}//; # Trim leading whitespace
- $line =~ s/\s+$//; # Trim trailing whitespace
- $item .= "$line\n";
- }
- push @$items, $item;
- --$line_num unless ($line_num == $#lines);
- } else { # $type is one of the bulleted types
- # List item boundary is another bullet or a blank line
- my $line;
- my $item = $_;
- $item =~ s/^(\s*$type\s*)//; # Trim the leading bullet
- my $leading_ws_width = length($1);
- $item =~ s/\s+$//; # Trim trailing whitespace
- $item .= "\n";
-
- while ($line_num < $#lines
- and ($line = $lines[++$line_num]) !~ /^\s*(?:$type|\Z)/) {
- $line =~ s/^\s{$leading_ws_width}//; # Trim leading whitespace
- $line =~ s/\s+$//; # Trim trailing whitespace
- $item .= "$line\n";
- }
- push @$items, $item;
- --$line_num unless ($line_num == $#lines);
- }
-}
-
-#
-# Print HTML-ified version to STDOUT
-#
-print<<"__HTML_HEADER__";
-
-
-
- $title
-
-
-
-
-
-
-
-