mirror of https://github.com/apache/lucene.git
288 lines
12 KiB
Plaintext
288 lines
12 KiB
Plaintext
Lucene Benchmark Contrib Change Log
|
|
|
|
The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
|
|
|
|
$Id:$
|
|
|
|
11/13/2009
|
|
LUCENE-2050: Added ability to run tasks within a serial sequence in
|
|
the background, by appending "&". The tasks are stopped & joined at
|
|
the end of the sequence. Also added Wait and RollbackIndex tasks.
|
|
Genericized NearRealTimeReaderTask to only reopen the reader
|
|
(previously it spawned its own thread, and also did searching).
|
|
Also changed the API of PerfRunData.getIndexReader: it now returns a
|
|
reference, and it's your job to decRef the reader when you're done
|
|
using it. (Mike McCandless)
|
|
|
|
11/12/2009
|
|
LUCENE-2059: allow TrecContentSource not to change the docname.
|
|
Previously, it would always append the iteration # to the docname.
|
|
With the new option content.source.excludeIteration, you can disable this.
|
|
The resulting index can then be used with the quality package to measure
|
|
relevance. (Robert Muir)
|
|
|
|
11/12/2009
|
|
LUCENE-2058: specify trec_eval submission output from the command line.
|
|
Previously, 4 arguments were required, but the third was unused. The
|
|
third argument is now the desired location of submission.txt (Robert Muir)
|
|
|
|
11/08/2009
|
|
LUCENE-2044: Added delete.percent.rand.seed to seed the Random instance
|
|
used by DeleteByPercentTask. (Mike McCandless)
|
|
|
|
11/07/2009
|
|
LUCENE-2043: Fix CommitIndexTask to also commit pending IndexReader
|
|
changes (Mike McCandless)
|
|
|
|
11/07/2009
|
|
LUCENE-2042: Added print.hits.field, to print each hit from the
|
|
Search* tasks. (Mike McCandless)
|
|
|
|
11/04/2009
|
|
LUCENE-2029: Added doc.body.stored and doc.body.tokenized; each
|
|
falls back to the non-body variant as its default. (Mike McCandless)
|
|
|
|
10/28/2009
|
|
LUCENE-1994: Fix thread safety of EnwikiContentSource and DocMaker
|
|
when doc.reuse.fields is false. Also made docs.reuse.fields=true
|
|
thread safe. (Mark Miller, Shai Erera, Mike McCandless)
|
|
|
|
8/4/2009
|
|
LUCENE-1770: Add EnwikiQueryMaker (Mark Miller)
|
|
|
|
8/04/2009
|
|
LUCENE-1773: Add FastVectorHighlighter tasks. This change is a
|
|
non-backwards compatible change in how subclasses of ReadTask define
|
|
a highlighter. The methods doHighlight, isMergeContiguousFragments,
|
|
maxNumFragments and getHighlighter are no longer used and have been
|
|
mark deprecated and package protected private so there's a compile
|
|
time error. Instead, the new getBenchmarkHighlighter method should
|
|
return an appropriate highlighter for the task. The configuration of
|
|
the highlighter tasks (maxFrags, mergeContiguous, etc.) is now
|
|
accepted as params to the task. (Koji Sekiguchi via Mike McCandless)
|
|
|
|
8/03/2009
|
|
LUCENE-1778: Add support for log.step setting per task type. Perviously, if
|
|
you included a log.step line in the .alg file, it had been applied to all
|
|
tasks. Now, you can include a log.step.AddDoc, or log.step.DeleteDoc (for
|
|
example) to control logging for just these tasks. If you want to ommit logging
|
|
for any other task, include log.step=-1. The syntax is "log.step." together
|
|
with the Task's 'short' name (i.e., without the 'Task' part).
|
|
(Shai Erera via Mark Miller)
|
|
|
|
7/24/2009
|
|
LUCENE-1595: Deprecate LineDocMaker and EnwikiDocMaker in favor of
|
|
using DocMaker directly, with content.source = LineDocSource or
|
|
EnwikiContentSource. NOTE: with this change, the "id" field from
|
|
the Wikipedia XML export is now indexed as the "docname" field
|
|
(previously it was indexed as "docid"). Additionaly, the
|
|
SearchWithSort task now accepts all types that SortField can accept
|
|
and no longer falls back to SortField.AUTO, which has been
|
|
deprecated. (Mike McCandless)
|
|
|
|
7/20/2009
|
|
LUCENE-1755: Fix WriteLineDocTask to output a document if it contains either
|
|
a title or body (or both). (Shai Erera via Mark Miller)
|
|
|
|
7/14/2009
|
|
LUCENE-1725: Fix the example Sort algorithm - auto is now deprecated and no longer works
|
|
with Benchmark. Benchmark will now throw an exception if you specify sort fields without
|
|
a type. The example sort algorithm is now typed. (Mark Miller)
|
|
|
|
7/6/2009
|
|
LUCENE-1730: Fix TrecContentSource to use ISO-8859-1 when reading the TREC files,
|
|
unless a different encoding is specified. Additionally, ContentSource now supports
|
|
a content.source.encoding parameter in the configuration file.
|
|
(Shai Erera via Mark Miller)
|
|
|
|
6/26/2009
|
|
LUCENE-1716: Added the following support:
|
|
doc.tokenized.norms: specifies whether to store norms
|
|
doc.body.tokenized.norms: special attribute for the body field
|
|
doc.index.props: specifies whether DocMaker should index the properties set on
|
|
DocData
|
|
writer.info.stream: specifies the info stream to set on IndexWriter (supported
|
|
values are: SystemOut, SystemErr and a file name). (Shai Erera via Mike McCandless)
|
|
|
|
6/23/09
|
|
LUCENE-1714: WriteLineDocTask incorrectly normalized text, by replacing only
|
|
occurrences of "\t" with a space. It now replaces "\r\n" in addition to that,
|
|
so that LineDocMaker won't fail. (Shai Erera via Michael McCandless)
|
|
|
|
6/17/09
|
|
LUCENE-1595: This issue breaks previous external algorithms. DocMaker has been
|
|
replaced with a concrete class which accepts a ContentSource for iterating over
|
|
a content source's documents. Most of the old DocMakers were changed to a
|
|
ContentSource implementation, and DocMaker is now a default document creation impl
|
|
that provides an easy way for reusing fields. When [doc.maker] is not defined in
|
|
an algorithm, the new DocMaker is the default. If you have .alg files which
|
|
specify a DocMaker (like ReutersDocMaker), you should change the [doc.maker] line to:
|
|
[content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource]
|
|
|
|
i.e.
|
|
doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker
|
|
becomes
|
|
content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
|
|
|
|
doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
|
|
becomes
|
|
content.source=org.apache.lucene.benchmark.byTask.feeds.SingleDocSource
|
|
|
|
Also, PerfTask now logs a message in tearDown() rather than each Task doing its
|
|
own logging. A new setting called [log.step] is consulted to determine how often
|
|
to log. [doc.add.log.step] is no longer a valid setting. For easy migration of
|
|
current .alg files, rename [doc.add.log.step] to [log.step] and [doc.delete.log.step]
|
|
to [delete.log.step].
|
|
|
|
Additionally, [doc.maker.forever] should be changed to [content.source.forever].
|
|
(Shai Erera via Mark Miller)
|
|
|
|
6/12/09
|
|
LUCENE-1539: Added DeleteByPercentTask which enables deleting a
|
|
percentage of documents and searching on them. Changed CommitIndex
|
|
to optionally accept a label (recorded as userData=<label> in the
|
|
commit point). Added FlushReaderTask, and modified OpenReaderTask
|
|
to also optionally take a label referencing a commit point to open.
|
|
Also changed default autoCommit (when IndexWriter is opened) to
|
|
true. (Jason Rutherglen via Mike McCandless)
|
|
|
|
12/20/08
|
|
LUCENE-1495: Allow task sequence to run for specfied number of seconds by adding ": 2.7s" (for example).
|
|
|
|
12/16/08
|
|
LUCENE-1493: Stop using deprecated Hits API for searching; add new
|
|
param search.num.hits to set top N docs to collect.
|
|
|
|
12/16/08
|
|
LUCENE-1492: Added optional readOnly param (default true) to OpenReader task.
|
|
|
|
9/9/08
|
|
LUCENE-1243: Added new sorting benchmark capabilities. Also Reopen and commit tasks. (Mark Miller via Grant Ingersoll)
|
|
|
|
5/10/08
|
|
LUCENE-1090: remove relative paths assumptions from benchmark code.
|
|
Only build.xml was modified: work-dir definition must remain so
|
|
benchmark tests can run from both trunk-home and benchmark-home.
|
|
|
|
3/9/08
|
|
LUCENE-1209: Fixed DocMaker settings by round. Prior to this fix, DocMaker settings of
|
|
first round were used in all rounds. (E.g. term vectors.)
|
|
(Mark Miller via Doron Cohen)
|
|
|
|
1/30/08
|
|
LUCENE-1156: Fixed redirect problem in EnwikiDocMaker. Refactored ExtractWikipedia to use EnwikiDocMaker. Added property to EnwikiDocMaker to allow
|
|
for skipping image only documents.
|
|
|
|
1/24/2008
|
|
LUCENE-1136: add ability to not count sub-task doLogic increment
|
|
|
|
1/23/2008
|
|
LUCENE-1129: ReadTask properly uses the traversalSize value
|
|
LUCENE-1128: Added support for benchmarking the highlighter
|
|
|
|
01/20/08
|
|
LUCENE-1139: various fixes
|
|
- add merge.scheduler, merge.policy config properties
|
|
- refactor Open/CreateIndexTask to share setting config on IndexWriter
|
|
- added doc.reuse.fields=true|false for LineDocMaker
|
|
- OptimizeTask now takes int param to call optimize(int maxNumSegments)
|
|
- CloseIndexTask now takes bool param to call close(false) (abort running merges)
|
|
|
|
|
|
01/03/08
|
|
LUCENE-1116: quality package improvements:
|
|
- add MRR computation;
|
|
- allow control of max #queries to run;
|
|
- verify log & report are flushed.
|
|
- add TREC query reader for the 1MQ track.
|
|
|
|
12/31/07
|
|
LUCENE-1102: EnwikiDocMaker now indexes the docid field, so results might not be comparable with results prior to this change, although
|
|
it is doubted that this one small field makes much difference.
|
|
|
|
12/13/07
|
|
LUCENE-1086: DocMakers setup for the "docs.dir" property
|
|
fixed to properly handle absolute paths. (Shai Erera via Doron Cohen)
|
|
|
|
9/18/07
|
|
LUCENE-941: infinite loop for alg: {[AddDoc(4000)]: 4} : *
|
|
ResetInputsTask fixed to work also after exhaustion.
|
|
All Reset Tasks now subclas ResetInputsTask.
|
|
|
|
8/9/07
|
|
LUCENE-971: Change enwiki tasks to a doc maker (extending
|
|
LineDocMaker) that directly processes the Wikipedia XML and produces
|
|
documents. Intermediate files (one per document) are no longer
|
|
created.
|
|
|
|
8/1/07
|
|
LUCENE-967: Add "ReadTokensTask" to allow for benchmarking just tokenization.
|
|
|
|
7/27/07
|
|
LUCENE-836: Add support for search quality benchmarking, running
|
|
a set of queries against a searcher, and, optionally produce a submission
|
|
report, and, if query judgements are available, compute quality measures:
|
|
recall, precision_at_N, average_precision, MAP. TREC specific Judge (based
|
|
on TREC QRels) and TREC Topics reader are included in o.a.l.benchmark.quality.trec
|
|
but any other format of queries and judgements can be implemented and used.
|
|
|
|
7/24/07
|
|
LUCENE-947: Add support for creating and index "one document per
|
|
line" from a large text file, which reduces per-document overhead of
|
|
opening a single file for each document.
|
|
|
|
6/30/07
|
|
LUCENE-848: Added support for Wikipedia benchmarking.
|
|
|
|
6/25/07
|
|
- LUCENE-940: Multi-threaded issues fixed: SimpleDateFormat; logging for addDoc/deleteDoc tasks.
|
|
- LUCENE-945: tests fail to find data dirs. Added sys-prop benchmark.work.dir and cfg-prop work.dir.
|
|
(Doron Cohen)
|
|
|
|
4/17/07
|
|
- LUCENE-863: Deprecated StandardBenchmarker in favour of byTask code.
|
|
(Otis Gospodnetic)
|
|
|
|
4/13/07
|
|
|
|
Better error handling and javadocs around "exhaustive" doc making.
|
|
|
|
3/25/07
|
|
|
|
LUCENE-849:
|
|
1. which HTML Parser is used is configurable with html.parser property.
|
|
2. External classes added to classpath with -Dbenchmark.ext.classpath=path.
|
|
3. '*' as repeating number now means "exhaust doc maker - no repetitions".
|
|
|
|
3/22/07
|
|
|
|
-Moved withRetrieve() call out of the loop in ReadTask
|
|
-Added SearchTravRetLoadFieldSelectorTask to help benchmark some of the FieldSelector capabilities
|
|
-Added options to store content bytes on the Reuters Doc (and others, but Reuters is the only one w/ it enabled)
|
|
|
|
3/21/07
|
|
|
|
Tests (for benchmarking code correctness) were added - LUCENE-840.
|
|
To be invoked by "ant test" from contrib/benchmark. (Doron Cohen)
|
|
|
|
3/19/07
|
|
|
|
1. Introduced an AbstractQueryMaker to hold common QueryMaker code. (GSI)
|
|
2. Added traversalSize parameter to SearchTravRetTask and SearchTravTask. Changed SearchTravRetTask to extend SearchTravTask. (GSI)
|
|
3. Added FileBasedQueryMaker to run queries from a File or resource. (GSI)
|
|
4. Modified query-maker generation for read related tasks to make further read tasks addition simpler and safer. (DC)
|
|
5. Changed Taks' setParams() to throw UnsupportedOperationException if that task does not suppot command line param. (DC)
|
|
6. Improved javadoc to specify all properties command line params currently supported. (DC)
|
|
7. Refactored ReportTasks so that it is easy/possible now to create new report tasks. (DC)
|
|
|
|
01/09/07
|
|
|
|
1. Committed Doron Cohen's benchmarking contribution, which provides an easily expandable task based approach to benchmarking. See the javadocs for information. (Doron Cohen via Grant Ingersoll)
|
|
|
|
2. Added this file.
|
|
|
|
3. 2/11/07: LUCENE-790 and 788: Fixed Locale issue with date formatter. Fixed some minor issues with benchmarking by task. Added a dependency
|
|
on the Lucene demo to the build classpath. (Doron Cohen, Grant Ingersoll)
|
|
|
|
4. 2/13/07: LUCENE-801: build.xml now builds Lucene core and Demo first and has classpath dependencies on the output of that build. (Doron Cohen, Grant Ingersoll)
|