mirror of https://github.com/apache/lucene.git
LUCENE-3965: merge CHANGES entries
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1327832 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
01774f1b56
commit
dda35d8c52
|
@ -358,13 +358,23 @@ Changes in Runtime Behavior
|
|||
to record any "metadata" from indexing (tokenized, omitNorms,
|
||||
IndexOptions, boost, etc.) (Mike McCandless)
|
||||
|
||||
* LUCENE-3309: Fast vector highlighter now inserts the
|
||||
MultiValuedSeparator for NOT_ANALYZED fields (in addition to
|
||||
ANALYZED fields). To ensure your offsets are correct you should
|
||||
provide an analyzer that returns 1 from the offsetGap method.
|
||||
(Mike McCandless)
|
||||
* LUCENE-3309: Fast vector highlighter now inserts the
|
||||
MultiValuedSeparator for NOT_ANALYZED fields (in addition to
|
||||
ANALYZED fields). To ensure your offsets are correct you should
|
||||
provide an analyzer that returns 1 from the offsetGap method.
|
||||
(Mike McCandless)
|
||||
|
||||
* LUCENE-2621: Removed contrib/instantiated. (Robert Muir)
|
||||
* LUCENE-2621: Removed contrib/instantiated. (Robert Muir)
|
||||
|
||||
* LUCENE-1768: StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder
|
||||
for RangeQueryNodes, since theses two classes were removed;
|
||||
TermRangeQueryNodeProcessor now creates TermRangeQueryNode,
|
||||
instead of RangeQueryNode; the same applies for numeric nodes;
|
||||
(Vinicius Barros via Uwe Schindler)
|
||||
|
||||
* LUCENE-3455: QueryParserBase.newFieldQuery() will throw a ParseException if
|
||||
any of the calls to the Analyzer throw an IOException. QueryParseBase.analyzeRangePart()
|
||||
will throw a RuntimException if an IOException is thrown by the Analyzer.
|
||||
|
||||
API Changes
|
||||
|
||||
|
@ -460,6 +470,39 @@ API Changes
|
|||
* LUCENE-3936: Renamed StringIndexDocValues to DocTermsIndexDocValues.
|
||||
(Martijn van Groningen)
|
||||
|
||||
* LUCENE-1768: Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder),
|
||||
ParametricRangeQueryNodeProcessor were removed. (Vinicius Barros via Uwe Schindler)
|
||||
|
||||
* LUCENE-3820: Deprecated constructors accepting pattern matching bounds. The input
|
||||
is buffered and matched in one pass. (Dawid Weiss)
|
||||
|
||||
* LUCENE-2413: Deprecated PatternAnalyzer in common/miscellaneous, in favor
|
||||
of the pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir)
|
||||
|
||||
* LUCENE-2413: Removed the AnalyzerUtil in common/miscellaneous. (Robert Muir)
|
||||
|
||||
* LUCENE-1370: Added ShingleFilter option to output unigrams if no shingles
|
||||
can be generated. (Chris Harris via Steven Rowe)
|
||||
|
||||
* LUCENE-2514, LUCENE-2551: JDK and ICU CollationKeyAnalyzers were changed to
|
||||
use pure byte keys when Version >= 4.0. This cuts sort key size approximately
|
||||
in half. (Robert Muir)
|
||||
|
||||
* LUCENE-3400: Removed DutchAnalyzer.setStemDictionary (Chris Male)
|
||||
|
||||
* LUCENE-3431: Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods
|
||||
since they prevented reuse. Stopwords are now generated at instantiation through
|
||||
the Analyzer's constructors. (Chris Male)
|
||||
|
||||
* LUCENE-3434: Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer
|
||||
since they prevent reuse. Both Analyzers should be configured at instantiation.
|
||||
(Chris Male)
|
||||
|
||||
* LUCENE-3765: Stopset ctors that previously took Set<?> or Map<?,String> now take
|
||||
CharArraySet and CharArrayMap respectively. Previously the behavior was confusing,
|
||||
and sometimes different depending on the type of set, and ultimately a CharArraySet
|
||||
or CharArrayMap was always used anyway. (Robert Muir)
|
||||
|
||||
New features
|
||||
|
||||
* LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions
|
||||
|
@ -737,6 +780,69 @@ New features
|
|||
* LUCENE-3778: Added a grouping utility class that makes it easier to use result
|
||||
grouping for pure Lucene apps. (Martijn van Groningen)
|
||||
|
||||
* LUCENE-2341: A new analysis/ filter: Morfologik - a dictionary-driven lemmatizer
|
||||
(accurate stemmer) for Polish (includes morphosyntactic annotations).
|
||||
(Michał Dybizbański, Dawid Weiss)
|
||||
|
||||
* LUCENE-2413: Consolidated Lucene/Solr analysis components into analysis/common.
|
||||
New features from Solr now available to Lucene users include:
|
||||
- o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
|
||||
and phrases.
|
||||
- o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
|
||||
constructs.
|
||||
- o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
|
||||
into subwords and performs optional transformations on subword groups.
|
||||
- o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
|
||||
filters out Tokens at the same position and Term text as the previous token.
|
||||
- o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace
|
||||
from Tokens in the stream.
|
||||
- o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens
|
||||
with text contained in the required words (inverse of StopFilter).
|
||||
- o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts
|
||||
hyphenated words broken into two lines back together.
|
||||
- o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies
|
||||
capitalization rules to tokens.
|
||||
- o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
|
||||
CharFilter, Tokenizer, and Tokenfilter for transforming text with regexes.
|
||||
- o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word
|
||||
synonyms.
|
||||
- o.a.l.analysis.phonetic: Package for phonetic search, containing various
|
||||
phonetic encoders such as Double Metaphone.
|
||||
|
||||
Some existing analysis components changed packages:
|
||||
- o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
|
||||
- o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
|
||||
- o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
|
||||
- o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
|
||||
- o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
|
||||
- o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
|
||||
- o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
|
||||
- o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
|
||||
- o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
|
||||
- o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
|
||||
- o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
|
||||
- o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
|
||||
- o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
|
||||
- o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
|
||||
- o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
|
||||
- o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
|
||||
- o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
|
||||
- o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
|
||||
- o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
|
||||
- o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
|
||||
- o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
|
||||
- o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
|
||||
- o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
|
||||
- o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
|
||||
- o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
|
||||
- o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
|
||||
- o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
|
||||
- o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
|
||||
|
||||
All analyzers in contrib/analyzers and contrib/icu were moved to the
|
||||
analysis/ module. The 'smartcn' and 'stempel' components now depend on 'common'.
|
||||
(Chris Male, Robert Muir)
|
||||
|
||||
Optimizations
|
||||
|
||||
* LUCENE-2588: Don't store unnecessary suffixes when writing the terms
|
||||
|
@ -809,6 +915,25 @@ Bug fixes
|
|||
* LUCENE-3890: Fixed NPE for grouped faceting on multi-valued fields.
|
||||
(Michael McCandless, Martijn van Groningen)
|
||||
|
||||
* LUCENE-2945: Fix hashCode/equals for surround query parser generated queries.
|
||||
(Paul Elschot, Simon Rosenthal, gsingers via ehatcher)
|
||||
|
||||
* LUCENE-3971: MappingCharFilter could return invalid final token position.
|
||||
(Dawid Weiss, Robert Muir)
|
||||
|
||||
* LUCENE-3820: PatternReplaceCharFilter could return invalid token positions.
|
||||
(Dawid Weiss)
|
||||
|
||||
* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in
|
||||
CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter,
|
||||
SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer,
|
||||
WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and
|
||||
CommonGramsFilter now populate PositionLengthAttribute. Fixed
|
||||
PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in
|
||||
ReversePathHierarchyTokenizer if skip is large. Fixed wrong final
|
||||
offset calculation in PathHierarchyTokenizer.
|
||||
(Mike McCandless, Uwe Schindler, Robert Muir)
|
||||
|
||||
Documentation
|
||||
|
||||
* LUCENE-3958: Javadocs corrections for IndexWriter.
|
||||
|
|
|
@ -1,124 +0,0 @@
|
|||
Analysis Module Change Log
|
||||
|
||||
For more information on past and future Lucene versions, please see:
|
||||
http://s.apache.org/luceneversions
|
||||
|
||||
======================= Trunk (not yet released) =======================
|
||||
|
||||
API Changes
|
||||
|
||||
* LUCENE-3820: Deprecated constructors accepting pattern matching bounds. The input
|
||||
is buffered and matched in one pass. (Dawid Weiss)
|
||||
|
||||
* LUCENE-2413: Deprecated PatternAnalyzer in common/miscellaneous, in favor
|
||||
of the pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir)
|
||||
|
||||
* LUCENE-2413: Removed the AnalyzerUtil in common/miscellaneous. (Robert Muir)
|
||||
|
||||
* LUCENE-1370: Added ShingleFilter option to output unigrams if no shingles
|
||||
can be generated. (Chris Harris via Steven Rowe)
|
||||
|
||||
* LUCENE-2514, LUCENE-2551: JDK and ICU CollationKeyAnalyzers were changed to
|
||||
use pure byte keys when Version >= 4.0. This cuts sort key size approximately
|
||||
in half. (Robert Muir)
|
||||
|
||||
* LUCENE-3400: Removed DutchAnalyzer.setStemDictionary (Chris Male)
|
||||
|
||||
* LUCENE-3431: Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods
|
||||
since they prevented reuse. Stopwords are now generated at instantiation through
|
||||
the Analyzer's constructors. (Chris Male)
|
||||
|
||||
* LUCENE-3434: Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer
|
||||
since they prevent reuse. Both Analyzers should be configured at instantiation.
|
||||
(Chris Male)
|
||||
|
||||
* LUCENE-3765: Stopset ctors that previously took Set<?> or Map<?,String> now take
|
||||
CharArraySet and CharArrayMap respectively. Previously the behavior was confusing,
|
||||
and sometimes different depending on the type of set, and ultimately a CharArraySet
|
||||
or CharArrayMap was always used anyway. (Robert Muir)
|
||||
|
||||
Bug fixes
|
||||
|
||||
* LUCENE-3971: MappingCharFilter could return invalid final token position.
|
||||
(Dawid Weiss, Robert Muir)
|
||||
|
||||
* LUCENE-3820: PatternReplaceCharFilter could return invalid token positions.
|
||||
(Dawid Weiss)
|
||||
|
||||
* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in
|
||||
CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter,
|
||||
SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer,
|
||||
WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and
|
||||
CommonGramsFilter now populate PositionLengthAttribute. Fixed
|
||||
PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in
|
||||
ReversePathHierarchyTokenizer if skip is large. Fixed wrong final
|
||||
offset calculation in PathHierarchyTokenizer.
|
||||
(Mike McCandless, Uwe Schindler, Robert Muir)
|
||||
|
||||
New Features
|
||||
|
||||
* LUCENE-2341: A new analyzer/ filter: Morfologik - a dictionary-driven lemmatizer
|
||||
(accurate stemmer) for Polish (includes morphosyntactic annotations).
|
||||
(Michał Dybizbański, Dawid Weiss)
|
||||
|
||||
* LUCENE-2413: Consolidated Lucene/Solr analysis components into common.
|
||||
New features from Solr now available to Lucene users include:
|
||||
- o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
|
||||
and phrases.
|
||||
- o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
|
||||
constructs.
|
||||
- o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
|
||||
into subwords and performs optional transformations on subword groups.
|
||||
- o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
|
||||
filters out Tokens at the same position and Term text as the previous token.
|
||||
- o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace
|
||||
from Tokens in the stream.
|
||||
- o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens
|
||||
with text contained in the required words (inverse of StopFilter).
|
||||
- o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts
|
||||
hyphenated words broken into two lines back together.
|
||||
- o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies
|
||||
capitalization rules to tokens.
|
||||
- o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
|
||||
CharFilter, Tokenizer, and Tokenfilter for transforming text with regexes.
|
||||
- o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word
|
||||
synonyms.
|
||||
- o.a.l.analysis.phonetic: Package for phonetic search, containing various
|
||||
phonetic encoders such as Double Metaphone.
|
||||
|
||||
Some existing analysis components changed packages:
|
||||
- o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
|
||||
- o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
|
||||
- o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
|
||||
- o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
|
||||
- o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
|
||||
- o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
|
||||
- o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
|
||||
- o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
|
||||
- o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
|
||||
- o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
|
||||
- o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
|
||||
- o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
|
||||
- o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
|
||||
- o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
|
||||
- o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
|
||||
- o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
|
||||
- o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
|
||||
- o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
|
||||
- o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
|
||||
- o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
|
||||
- o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
|
||||
- o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
|
||||
- o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
|
||||
- o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
|
||||
- o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
|
||||
- o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
|
||||
- o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
|
||||
- o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
|
||||
|
||||
All analyzers in contrib/analyzers and contrib/icu were moved to the
|
||||
analysis module. The 'smartcn' and 'stempel' components now depend on 'common'.
|
||||
(Chris Male, Robert Muir)
|
||||
|
||||
* SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy)
|
||||
|
|
@ -1,455 +0,0 @@
|
|||
Lucene Benchmark Contrib Change Log
|
||||
|
||||
The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
|
||||
|
||||
For more information on past and future Lucene versions, please see:
|
||||
http://s.apache.org/luceneversions
|
||||
|
||||
3/29/2012
|
||||
LUCENE-3937: Workaround the XERCES-J bug by avoiding the broken UTF-8 decoding
|
||||
in the v2.9.1 release. Replaced the XERCESJ-1247-patched jar with the v2.9.1
|
||||
release jar. (Uwe Schindler, Robert Muir, Mike McCandless)
|
||||
|
||||
2/15/2011
|
||||
LUCENE-3768: fix typos in .alg files, and add test that all .alg files in conf/
|
||||
can be parsed. (Sami Siren via Robert Muir)
|
||||
|
||||
10/07/2011
|
||||
LUCENE-3262: Facet benchmarking - Benchmark tasks and sources were added for indexing
|
||||
with facets, demonstrated in facets.alg. (Gilad Barkai, Doron Cohen)
|
||||
|
||||
09/25/2011
|
||||
LUCENE-3457: Upgrade commons-compress to 1.2 (and undo LUCENE-2980's workaround).
|
||||
(Doron Cohen)
|
||||
|
||||
05/25/2011
|
||||
LUCENE-3137: ExtractReuters supports out-dir param suffixed by a slash. (Doron Cohen)
|
||||
|
||||
03/31/2011
|
||||
Updated ReadTask to the new method for obtaining a top-level deleted docs
|
||||
bitset. Also checking the bitset for null, when there are no deleted docs.
|
||||
(Steve Rowe, Mike McCandless)
|
||||
|
||||
Updated NewAnalyzerTask and NewShingleAnalyzerTask to handle analyzers
|
||||
in the new org.apache.lucene.analysis.core package (KeywordAnalyzer,
|
||||
SimpleAnalyzer, etc.) (Steve Rowe, Robert Muir)
|
||||
|
||||
Updated ReadTokensTask to convert tokens to their indexed forms
|
||||
(char[]->byte[]), just as the indexer does. This allows measurement
|
||||
of the conversion process, which is important for analysis components
|
||||
that customize it, e.g. (ICU)CollationKeyFilter. As a result,
|
||||
benchmarks that incorporate this task will no longer be directly
|
||||
comparable between 3.X and 4.0. (Robert Muir, Steve Rowe)
|
||||
|
||||
03/24/2011
|
||||
LUCENE-2977: WriteLineDocTask now automatically detects how to write -
|
||||
GZip or BZip2 or Plain-text - according to the output file extension.
|
||||
Property bzip.compression of WriteLineDocTask was canceled. (Doron Cohen)
|
||||
|
||||
03/23/2011
|
||||
LUCENE-2980: Benchmark's ContentSource no more requires lower case file suffixes
|
||||
for detecting file type (gzip/bzip2/text). As part of this fix worked around an
|
||||
issue with gzip input streams which were remaining open (See COMPRESS-127).
|
||||
(Doron Cohen)
|
||||
|
||||
03/22/2011
|
||||
LUCENE-2978: Upgrade benchmark's commons-compress from 1.0 to 1.1 as
|
||||
the move of gzip decompression in LUCENE-1540 from Java's GZipInputStream
|
||||
to commons-compress 1.0 made it 15 times slower. In 1.1 no such slow-down
|
||||
is observed. (Doron Cohen)
|
||||
|
||||
03/21/2011
|
||||
LUCENE-2958: WriteLineDocTask improvements - allow to emit line docs also for empty
|
||||
docs, and be flexible about which fields are added to the line file. For this, a header
|
||||
line was added to the line file. That header is examined by LineDocSource. Old line
|
||||
files which have no header line are handled as before, imposing the default header.
|
||||
(Doron Cohen, Shai Erera, Mike McCandless)
|
||||
|
||||
03/21/2011
|
||||
LUCENE-2964: Allow benchmark tasks from alternative packages,
|
||||
specified through a new property "alt.tasks.packages".
|
||||
(Doron Cohen, Shai Erera)
|
||||
|
||||
03/20/2011
|
||||
LUCENE-2963: Easier way to run benchmark, by calling Benmchmark.exec(alg-file).
|
||||
(Doron Cohen)
|
||||
|
||||
03/10/2011
|
||||
LUCENE-2961: Removed lib/xml-apis.jar, since JVM 1.5+ already contains the
|
||||
JAXP 1.3 interface classes it provides.
|
||||
|
||||
02/05/2011
|
||||
LUCENE-1540: Improvements to contrib.benchmark for TREC collections.
|
||||
ContentSource can now process plain text files, gzip files, and bzip2 files.
|
||||
TREC doc parsing now handles the TREC gov2 collection and TREC disks 4&5-CR
|
||||
collection (both used by many TREC tasks). (Shai Erera, Doron Cohen)
|
||||
|
||||
01/31/2011
|
||||
LUCENE-1591: Rollback to xerces-2.9.1-patched-XERCESJ-1257.jar to workaround
|
||||
XERCESJ-1257, which we hit on current Wikipedia XML export
|
||||
(ENWIKI-20110115-pages-articles.xml) with xerces-2.10.0.jar. (Mike McCandless)
|
||||
|
||||
01/26/2011
|
||||
LUCENE-929: ExtractReuters first extracts to a tmp dir and then renames. That
|
||||
way, if a previous extract attempt failed, "ant extract-reuters" will still
|
||||
extract the files. (Shai Erera, Doron Cohen, Grant Ingersoll)
|
||||
|
||||
01/24/2011
|
||||
LUCENE-2885: Add WaitForMerges task (calls IndexWriter.waitForMerges()).
|
||||
(Mike McCandless)
|
||||
|
||||
10/10/2010
|
||||
The locally built patched version of the Xerces-J jar introduced
|
||||
as part of LUCENE-1591 is no longer required, because Xerces
|
||||
2.10.0, which contains a fix for XERCESJ-1257 (see
|
||||
http://svn.apache.org/viewvc?view=revision&revision=554069),
|
||||
was released earlier this year. Upgraded
|
||||
xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
|
||||
to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
|
||||
|
||||
8/2/2010
|
||||
LUCENE-2582: You can now specify the default codec to use for
|
||||
writing new segments by adding default.codec = Pulsing (for
|
||||
example), in the alg file. (Mike McCandless)
|
||||
|
||||
4/27/2010: WriteLineDocTask now supports multi-threading. Also,
|
||||
StringBufferReader was renamed to StringBuilderReader and works on
|
||||
StringBuilder now. In addition, LongToEnglishContentSource starts from 0
|
||||
(instead of Long.MIN_VAL+10) and wraps around to MIN_VAL (if you ever hit
|
||||
Long.MAX_VAL). (Shai Erera)
|
||||
|
||||
4/07/2010
|
||||
LUCENE-2377: Enable the use of NoMergePolicy and NoMergeScheduler by
|
||||
CreateIndexTask. (Shai Erera)
|
||||
|
||||
3/28/2010
|
||||
LUCENE-2353: Fixed bug in Config where Windows absolute path property values
|
||||
were incorrectly handled (Shai Erera)
|
||||
|
||||
3/24/2010
|
||||
LUCENE-2343: Added support for benchmarking collectors. (Grant Ingersoll, Shai Erera)
|
||||
|
||||
2/21/2010
|
||||
LUCENE-2254: Add support to the quality package for running
|
||||
experiments with any combination of Title, Description, and Narrative.
|
||||
(Robert Muir)
|
||||
|
||||
1/28/2010
|
||||
LUCENE-2223: Add a benchmark for ShingleFilter. You can wrap any
|
||||
analyzer with ShingleAnalyzerWrapper and specify shingle parameters
|
||||
with the NewShingleAnalyzer task. (Steven Rowe via Robert Muir)
|
||||
|
||||
1/14/2010
|
||||
LUCENE-2210: TrecTopicsReader now properly reads descriptions and
|
||||
narratives from trec topics files. (Robert Muir)
|
||||
|
||||
1/11/2010
|
||||
LUCENE-2181: Add a benchmark for collation. This adds NewLocaleTask,
|
||||
which sets a Locale in the run data for collation to use, and can be
|
||||
used in the future for benchmarking localized range queries and sorts.
|
||||
Also add NewCollationAnalyzerTask, which works with both JDK and ICU
|
||||
Collator implementations. Fix ReadTokensTask to not tokenize fields
|
||||
unless they should be tokenized according to DocMaker config. The
|
||||
easiest way to run the benchmark is to run 'ant collation'
|
||||
(Steven Rowe via Robert Muir)
|
||||
|
||||
12/22/2009
|
||||
LUCENE-2178: Allow multiple locations to add to the class path with
|
||||
-Dbenchmark.ext.classpath=... when running "ant run-task" (Steven
|
||||
Rowe via Mike McCandless)
|
||||
|
||||
12/17/2009
|
||||
LUCENE-2168: Allow negative relative thread priority for BG tasks
|
||||
(Mike McCandless)
|
||||
|
||||
12/07/2009
|
||||
LUCENE-2106: ReadTask does not close its Reader when
|
||||
OpenReader/CloseReader are not used. (Mark Miller)
|
||||
|
||||
11/17/2009
|
||||
LUCENE-2079: Allow specifying delta thread priority after the "&";
|
||||
added log.time.step.msec to print per-time-period counts; fixed
|
||||
NearRealTimeTask to print reopen times (in msec) of each reopen, at
|
||||
the end. (Mike McCandless)
|
||||
|
||||
11/13/2009
|
||||
LUCENE-2050: Added ability to run tasks within a serial sequence in
|
||||
the background, by appending "&". The tasks are stopped & joined at
|
||||
the end of the sequence. Also added Wait and RollbackIndex tasks.
|
||||
Genericized NearRealTimeReaderTask to only reopen the reader
|
||||
(previously it spawned its own thread, and also did searching).
|
||||
Also changed the API of PerfRunData.getIndexReader: it now returns a
|
||||
reference, and it's your job to decRef the reader when you're done
|
||||
using it. (Mike McCandless)
|
||||
|
||||
11/12/2009
|
||||
LUCENE-2059: allow TrecContentSource not to change the docname.
|
||||
Previously, it would always append the iteration # to the docname.
|
||||
With the new option content.source.excludeIteration, you can disable this.
|
||||
The resulting index can then be used with the quality package to measure
|
||||
relevance. (Robert Muir)
|
||||
|
||||
11/12/2009
|
||||
LUCENE-2058: specify trec_eval submission output from the command line.
|
||||
Previously, 4 arguments were required, but the third was unused. The
|
||||
third argument is now the desired location of submission.txt (Robert Muir)
|
||||
|
||||
11/08/2009
|
||||
LUCENE-2044: Added delete.percent.rand.seed to seed the Random instance
|
||||
used by DeleteByPercentTask. (Mike McCandless)
|
||||
|
||||
11/07/2009
|
||||
LUCENE-2043: Fix CommitIndexTask to also commit pending IndexReader
|
||||
changes (Mike McCandless)
|
||||
|
||||
11/07/2009
|
||||
LUCENE-2042: Added print.hits.field, to print each hit from the
|
||||
Search* tasks. (Mike McCandless)
|
||||
|
||||
11/04/2009
|
||||
LUCENE-2029: Added doc.body.stored and doc.body.tokenized; each
|
||||
falls back to the non-body variant as its default. (Mike McCandless)
|
||||
|
||||
10/28/2009
|
||||
LUCENE-1994: Fix thread safety of EnwikiContentSource and DocMaker
|
||||
when doc.reuse.fields is false. Also made docs.reuse.fields=true
|
||||
thread safe. (Mark Miller, Shai Erera, Mike McCandless)
|
||||
|
||||
8/4/2009
|
||||
LUCENE-1770: Add EnwikiQueryMaker (Mark Miller)
|
||||
|
||||
8/04/2009
|
||||
LUCENE-1773: Add FastVectorHighlighter tasks. This change is a
|
||||
non-backwards compatible change in how subclasses of ReadTask define
|
||||
a highlighter. The methods doHighlight, isMergeContiguousFragments,
|
||||
maxNumFragments and getHighlighter are no longer used and have been
|
||||
mark deprecated and package protected private so there's a compile
|
||||
time error. Instead, the new getBenchmarkHighlighter method should
|
||||
return an appropriate highlighter for the task. The configuration of
|
||||
the highlighter tasks (maxFrags, mergeContiguous, etc.) is now
|
||||
accepted as params to the task. (Koji Sekiguchi via Mike McCandless)
|
||||
|
||||
8/03/2009
|
||||
LUCENE-1778: Add support for log.step setting per task type. Perviously, if
|
||||
you included a log.step line in the .alg file, it had been applied to all
|
||||
tasks. Now, you can include a log.step.AddDoc, or log.step.DeleteDoc (for
|
||||
example) to control logging for just these tasks. If you want to ommit logging
|
||||
for any other task, include log.step=-1. The syntax is "log.step." together
|
||||
with the Task's 'short' name (i.e., without the 'Task' part).
|
||||
(Shai Erera via Mark Miller)
|
||||
|
||||
7/24/2009
|
||||
LUCENE-1595: Deprecate LineDocMaker and EnwikiDocMaker in favor of
|
||||
using DocMaker directly, with content.source = LineDocSource or
|
||||
EnwikiContentSource. NOTE: with this change, the "id" field from
|
||||
the Wikipedia XML export is now indexed as the "docname" field
|
||||
(previously it was indexed as "docid"). Additionaly, the
|
||||
SearchWithSort task now accepts all types that SortField can accept
|
||||
and no longer falls back to SortField.AUTO, which has been
|
||||
deprecated. (Mike McCandless)
|
||||
|
||||
7/20/2009
|
||||
LUCENE-1755: Fix WriteLineDocTask to output a document if it contains either
|
||||
a title or body (or both). (Shai Erera via Mark Miller)
|
||||
|
||||
7/14/2009
|
||||
LUCENE-1725: Fix the example Sort algorithm - auto is now deprecated and no longer works
|
||||
with Benchmark. Benchmark will now throw an exception if you specify sort fields without
|
||||
a type. The example sort algorithm is now typed. (Mark Miller)
|
||||
|
||||
7/6/2009
|
||||
LUCENE-1730: Fix TrecContentSource to use ISO-8859-1 when reading the TREC files,
|
||||
unless a different encoding is specified. Additionally, ContentSource now supports
|
||||
a content.source.encoding parameter in the configuration file.
|
||||
(Shai Erera via Mark Miller)
|
||||
|
||||
6/26/2009
|
||||
LUCENE-1716: Added the following support:
|
||||
doc.tokenized.norms: specifies whether to store norms
|
||||
doc.body.tokenized.norms: special attribute for the body field
|
||||
doc.index.props: specifies whether DocMaker should index the properties set on
|
||||
DocData
|
||||
writer.info.stream: specifies the info stream to set on IndexWriter (supported
|
||||
values are: SystemOut, SystemErr and a file name). (Shai Erera via Mike McCandless)
|
||||
|
||||
6/23/09
|
||||
LUCENE-1714: WriteLineDocTask incorrectly normalized text, by replacing only
|
||||
occurrences of "\t" with a space. It now replaces "\r\n" in addition to that,
|
||||
so that LineDocMaker won't fail. (Shai Erera via Michael McCandless)
|
||||
|
||||
6/17/09
|
||||
LUCENE-1595: This issue breaks previous external algorithms. DocMaker has been
|
||||
replaced with a concrete class which accepts a ContentSource for iterating over
|
||||
a content source's documents. Most of the old DocMakers were changed to a
|
||||
ContentSource implementation, and DocMaker is now a default document creation impl
|
||||
that provides an easy way for reusing fields. When [doc.maker] is not defined in
|
||||
an algorithm, the new DocMaker is the default. If you have .alg files which
|
||||
specify a DocMaker (like ReutersDocMaker), you should change the [doc.maker] line to:
|
||||
[content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource]
|
||||
|
||||
i.e.
|
||||
doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker
|
||||
becomes
|
||||
content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
|
||||
|
||||
doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
|
||||
becomes
|
||||
content.source=org.apache.lucene.benchmark.byTask.feeds.SingleDocSource
|
||||
|
||||
Also, PerfTask now logs a message in tearDown() rather than each Task doing its
|
||||
own logging. A new setting called [log.step] is consulted to determine how often
|
||||
to log. [doc.add.log.step] is no longer a valid setting. For easy migration of
|
||||
current .alg files, rename [doc.add.log.step] to [log.step] and [doc.delete.log.step]
|
||||
to [delete.log.step].
|
||||
|
||||
Additionally, [doc.maker.forever] should be changed to [content.source.forever].
|
||||
(Shai Erera via Mark Miller)
|
||||
|
||||
6/12/09
|
||||
LUCENE-1539: Added DeleteByPercentTask which enables deleting a
|
||||
percentage of documents and searching on them. Changed CommitIndex
|
||||
to optionally accept a label (recorded as userData=<label> in the
|
||||
commit point). Added FlushReaderTask, and modified OpenReaderTask
|
||||
to also optionally take a label referencing a commit point to open.
|
||||
Also changed default autoCommit (when IndexWriter is opened) to
|
||||
true. (Jason Rutherglen via Mike McCandless)
|
||||
|
||||
12/20/08
|
||||
LUCENE-1495: Allow task sequence to run for specfied number of seconds by adding ": 2.7s" (for example).
|
||||
|
||||
12/16/08
|
||||
LUCENE-1493: Stop using deprecated Hits API for searching; add new
|
||||
param search.num.hits to set top N docs to collect.
|
||||
|
||||
12/16/08
|
||||
LUCENE-1492: Added optional readOnly param (default true) to OpenReader task.
|
||||
|
||||
9/9/08
|
||||
LUCENE-1243: Added new sorting benchmark capabilities. Also Reopen and commit tasks. (Mark Miller via Grant Ingersoll)
|
||||
|
||||
5/10/08
|
||||
LUCENE-1090: remove relative paths assumptions from benchmark code.
|
||||
Only build.xml was modified: work-dir definition must remain so
|
||||
benchmark tests can run from both trunk-home and benchmark-home.
|
||||
|
||||
3/9/08
|
||||
LUCENE-1209: Fixed DocMaker settings by round. Prior to this fix, DocMaker settings of
|
||||
first round were used in all rounds. (E.g. term vectors.)
|
||||
(Mark Miller via Doron Cohen)
|
||||
|
||||
1/30/08
|
||||
LUCENE-1156: Fixed redirect problem in EnwikiDocMaker. Refactored ExtractWikipedia to use EnwikiDocMaker. Added property to EnwikiDocMaker to allow
|
||||
for skipping image only documents.
|
||||
|
||||
1/24/2008
|
||||
LUCENE-1136: add ability to not count sub-task doLogic increment
|
||||
|
||||
1/23/2008
|
||||
LUCENE-1129: ReadTask properly uses the traversalSize value
|
||||
LUCENE-1128: Added support for benchmarking the highlighter
|
||||
|
||||
01/20/08
|
||||
LUCENE-1139: various fixes
|
||||
- add merge.scheduler, merge.policy config properties
|
||||
- refactor Open/CreateIndexTask to share setting config on IndexWriter
|
||||
- added doc.reuse.fields=true|false for LineDocMaker
|
||||
- OptimizeTask now takes int param to call optimize(int maxNumSegments)
|
||||
- CloseIndexTask now takes bool param to call close(false) (abort running merges)
|
||||
|
||||
|
||||
01/03/08
|
||||
LUCENE-1116: quality package improvements:
|
||||
- add MRR computation;
|
||||
- allow control of max #queries to run;
|
||||
- verify log & report are flushed.
|
||||
- add TREC query reader for the 1MQ track.
|
||||
|
||||
12/31/07
|
||||
LUCENE-1102: EnwikiDocMaker now indexes the docid field, so results might not be comparable with results prior to this change, although
|
||||
it is doubted that this one small field makes much difference.
|
||||
|
||||
12/13/07
|
||||
LUCENE-1086: DocMakers setup for the "docs.dir" property
|
||||
fixed to properly handle absolute paths. (Shai Erera via Doron Cohen)
|
||||
|
||||
9/18/07
|
||||
LUCENE-941: infinite loop for alg: {[AddDoc(4000)]: 4} : *
|
||||
ResetInputsTask fixed to work also after exhaustion.
|
||||
All Reset Tasks now subclas ResetInputsTask.
|
||||
|
||||
8/9/07
|
||||
LUCENE-971: Change enwiki tasks to a doc maker (extending
|
||||
LineDocMaker) that directly processes the Wikipedia XML and produces
|
||||
documents. Intermediate files (one per document) are no longer
|
||||
created.
|
||||
|
||||
8/1/07
|
||||
LUCENE-967: Add "ReadTokensTask" to allow for benchmarking just tokenization.
|
||||
|
||||
7/27/07
|
||||
LUCENE-836: Add support for search quality benchmarking, running
|
||||
a set of queries against a searcher, and, optionally produce a submission
|
||||
report, and, if query judgements are available, compute quality measures:
|
||||
recall, precision_at_N, average_precision, MAP. TREC specific Judge (based
|
||||
on TREC QRels) and TREC Topics reader are included in o.a.l.benchmark.quality.trec
|
||||
but any other format of queries and judgements can be implemented and used.
|
||||
|
||||
7/24/07
|
||||
LUCENE-947: Add support for creating and index "one document per
|
||||
line" from a large text file, which reduces per-document overhead of
|
||||
opening a single file for each document.
|
||||
|
||||
6/30/07
|
||||
LUCENE-848: Added support for Wikipedia benchmarking.
|
||||
|
||||
6/25/07
|
||||
- LUCENE-940: Multi-threaded issues fixed: SimpleDateFormat; logging for addDoc/deleteDoc tasks.
|
||||
- LUCENE-945: tests fail to find data dirs. Added sys-prop benchmark.work.dir and cfg-prop work.dir.
|
||||
(Doron Cohen)
|
||||
|
||||
4/17/07
|
||||
- LUCENE-863: Deprecated StandardBenchmarker in favour of byTask code.
|
||||
(Otis Gospodnetic)
|
||||
|
||||
4/13/07
|
||||
|
||||
Better error handling and javadocs around "exhaustive" doc making.
|
||||
|
||||
3/25/07
|
||||
|
||||
LUCENE-849:
|
||||
1. which HTML Parser is used is configurable with html.parser property.
|
||||
2. External classes added to classpath with -Dbenchmark.ext.classpath=path.
|
||||
3. '*' as repeating number now means "exhaust doc maker - no repetitions".
|
||||
|
||||
3/22/07
|
||||
|
||||
-Moved withRetrieve() call out of the loop in ReadTask
|
||||
-Added SearchTravRetLoadFieldSelectorTask to help benchmark some of the FieldSelector capabilities
|
||||
-Added options to store content bytes on the Reuters Doc (and others, but Reuters is the only one w/ it enabled)
|
||||
|
||||
3/21/07
|
||||
|
||||
Tests (for benchmarking code correctness) were added - LUCENE-840.
|
||||
To be invoked by "ant test" from contrib/benchmark. (Doron Cohen)
|
||||
|
||||
3/19/07
|
||||
|
||||
1. Introduced an AbstractQueryMaker to hold common QueryMaker code. (GSI)
|
||||
2. Added traversalSize parameter to SearchTravRetTask and SearchTravTask. Changed SearchTravRetTask to extend SearchTravTask. (GSI)
|
||||
3. Added FileBasedQueryMaker to run queries from a File or resource. (GSI)
|
||||
4. Modified query-maker generation for read related tasks to make further read tasks addition simpler and safer. (DC)
|
||||
5. Changed Taks' setParams() to throw UnsupportedOperationException if that task does not suppot command line param. (DC)
|
||||
6. Improved javadoc to specify all properties command line params currently supported. (DC)
|
||||
7. Refactored ReportTasks so that it is easy/possible now to create new report tasks. (DC)
|
||||
|
||||
01/09/07
|
||||
|
||||
1. Committed Doron Cohen's benchmarking contribution, which provides an easily expandable task based approach to benchmarking. See the javadocs for information. (Doron Cohen via Grant Ingersoll)
|
||||
|
||||
2. Added this file.
|
||||
|
||||
3. 2/11/07: LUCENE-790 and 788: Fixed Locale issue with date formatter. Fixed some minor issues with benchmarking by task. Added a dependency
|
||||
on the Lucene demo to the build classpath. (Doron Cohen, Grant Ingersoll)
|
||||
|
||||
4. 2/13/07: LUCENE-801: build.xml now builds Lucene core and Demo first and has classpath dependencies on the output of that build. (Doron Cohen, Grant Ingersoll)
|
|
@ -1,29 +0,0 @@
|
|||
QueryParser Module Change Log
|
||||
|
||||
For more information on past and future Lucene versions, please see:
|
||||
http://s.apache.org/luceneversions
|
||||
|
||||
======================= Trunk (not yet released) =======================
|
||||
|
||||
Changes in runtime behavior
|
||||
|
||||
* LUCENE-1768: StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder
|
||||
for RangeQueryNodes, since theses two classes were removed;
|
||||
TermRangeQueryNodeProcessor now creates TermRangeQueryNode,
|
||||
instead of RangeQueryNode; the same applies for numeric nodes;
|
||||
(Vinicius Barros via Uwe Schindler)
|
||||
|
||||
* LUCENE-3455: QueryParserBase.newFieldQuery() will throw a ParseException if
|
||||
any of the calls to the Analyzer throw an IOException. QueryParseBase.analyzeRangePart()
|
||||
will throw a RuntimException if an IOException is thrown by the Analyzer.
|
||||
|
||||
API Changes
|
||||
|
||||
* LUCENE-1768: Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder),
|
||||
ParametricRangeQueryNodeProcessor were removed. (Vinicius Barros via Uwe Schindler)
|
||||
|
||||
Bug fixes
|
||||
|
||||
* LUCENE-2945: Fix hashCode/equals for surround query parser generated queries.
|
||||
(Paul Elschot, Simon Rosenthal, gsingers via ehatcher)
|
||||
|
Loading…
Reference in New Issue