mirror of https://github.com/apache/lucene.git
LUCENE-9344: Convert .txt files to properly formatted .md files (#1449)
This commit is contained in:
parent
a11b78e06a
commit
c7697b088c
|
@ -1,17 +1,17 @@
|
||||||
Lucene Build Instructions
|
# Lucene Build Instructions
|
||||||
|
|
||||||
Basic steps:
|
## Basic steps:
|
||||||
0) Install OpenJDK 11 (or greater), Ant 1.8.2+, Ivy 2.2.0
|
|
||||||
1) Download Lucene from Apache and unpack it
|
|
||||||
2) Connect to the top-level of your Lucene installation
|
|
||||||
3) Install JavaCC (optional)
|
|
||||||
4) Run ant
|
|
||||||
|
|
||||||
Step 0) Set up your development environment (OpenJDK 11 or greater,
|
0. Install OpenJDK 11 (or greater), Ant 1.8.2+, Ivy 2.2.0
|
||||||
Ant 1.8.2+, Ivy 2.2.0)
|
1. Download Lucene from Apache and unpack it
|
||||||
|
2. Connect to the top-level of your Lucene installation
|
||||||
|
3. Install JavaCC (optional)
|
||||||
|
4. Run ant
|
||||||
|
|
||||||
|
## Step 0) Set up your development environment (OpenJDK 11 or greater, Ant 1.8.2+, Ivy 2.2.0)
|
||||||
|
|
||||||
We'll assume that you know how to get and set up the JDK - if you
|
We'll assume that you know how to get and set up the JDK - if you
|
||||||
don't, then we suggest starting at http://www.oracle.com/java/ and learning
|
don't, then we suggest starting at https://www.oracle.com/java/ and learning
|
||||||
more about Java, before returning to this README. Lucene runs with
|
more about Java, before returning to this README. Lucene runs with
|
||||||
Java 11 and later.
|
Java 11 and later.
|
||||||
|
|
||||||
|
@ -22,31 +22,31 @@ Ant is "kind of like make without make's wrinkles". Ant is
|
||||||
implemented in java and uses XML-based configuration files. You can
|
implemented in java and uses XML-based configuration files. You can
|
||||||
get it at:
|
get it at:
|
||||||
|
|
||||||
http://ant.apache.org
|
https://ant.apache.org
|
||||||
|
|
||||||
You'll need to download the Ant binary distribution. Install it
|
You'll need to download the Ant binary distribution. Install it
|
||||||
according to the instructions at:
|
according to the instructions at:
|
||||||
|
|
||||||
http://ant.apache.org/manual
|
https://ant.apache.org/manual
|
||||||
|
|
||||||
Finally, you'll need to install ivy into your ant lib folder
|
Finally, you'll need to install ivy into your ant lib folder
|
||||||
(~/.ant/lib). You can get it from http://ant.apache.org/ivy/.
|
(~/.ant/lib). You can get it from http://ant.apache.org/ivy/.
|
||||||
If you skip this step, the Lucene build system will offer to do it
|
If you skip this step, the Lucene build system will offer to do it
|
||||||
for you.
|
for you.
|
||||||
|
|
||||||
Step 1) Download Lucene from Apache
|
## Step 1) Download Lucene from Apache
|
||||||
|
|
||||||
We'll assume you already did this, or you wouldn't be reading this
|
We'll assume you already did this, or you wouldn't be reading this
|
||||||
file. However, you might have received this file by some alternate
|
file. However, you might have received this file by some alternate
|
||||||
route, or you might have an incomplete copy of the Lucene, so: Lucene
|
route, or you might have an incomplete copy of the Lucene, so: Lucene
|
||||||
releases are available for download at:
|
releases are available for download at:
|
||||||
|
|
||||||
http://www.apache.org/dyn/closer.cgi/lucene/java/
|
https://www.apache.org/dyn/closer.cgi/lucene/java/
|
||||||
|
|
||||||
Download either a zip or a tarred/gzipped version of the archive, and
|
Download either a zip or a tarred/gzipped version of the archive, and
|
||||||
uncompress it into a directory of your choice.
|
uncompress it into a directory of your choice.
|
||||||
|
|
||||||
Step 2) From the command line, change (cd) into the top-level directory of your Lucene installation
|
## Step 2) From the command line, change (cd) into the top-level directory of your Lucene installation
|
||||||
|
|
||||||
Lucene's top-level directory contains the build.xml file. By default,
|
Lucene's top-level directory contains the build.xml file. By default,
|
||||||
you do not need to change any of the settings in this file, but you do
|
you do not need to change any of the settings in this file, but you do
|
||||||
|
@ -66,7 +66,7 @@ system.
|
||||||
|
|
||||||
NOTE: the ~ character represents your user account home directory.
|
NOTE: the ~ character represents your user account home directory.
|
||||||
|
|
||||||
Step 3) Run ant
|
## Step 4) Run ant
|
||||||
|
|
||||||
Assuming you have ant in your PATH and have set ANT_HOME to the
|
Assuming you have ant in your PATH and have set ANT_HOME to the
|
||||||
location of your ant installation, typing "ant" at the shell prompt
|
location of your ant installation, typing "ant" at the shell prompt
|
||||||
|
@ -76,10 +76,12 @@ and command prompt should run ant. Ant will by default look for the
|
||||||
If you want to build the documentation, type "ant documentation".
|
If you want to build the documentation, type "ant documentation".
|
||||||
|
|
||||||
For further information on Lucene, go to:
|
For further information on Lucene, go to:
|
||||||
http://lucene.apache.org/
|
|
||||||
|
https://lucene.apache.org/
|
||||||
|
|
||||||
Please join the Lucene-User mailing list by visiting this site:
|
Please join the Lucene-User mailing list by visiting this site:
|
||||||
http://lucene.apache.org/core/discussion.html
|
|
||||||
|
https://lucene.apache.org/core/discussion.html
|
||||||
|
|
||||||
Please post suggestions, questions, corrections or additions to this
|
Please post suggestions, questions, corrections or additions to this
|
||||||
document to the lucene-user mailing list.
|
document to the lucene-user mailing list.
|
||||||
|
@ -87,4 +89,4 @@ document to the lucene-user mailing list.
|
||||||
This file was originally written by Steven J. Owens <puff@darksleep.com>.
|
This file was originally written by Steven J. Owens <puff@darksleep.com>.
|
||||||
This file was modified by Jon S. Stevens <jon@latchkey.com>.
|
This file was modified by Jon S. Stevens <jon@latchkey.com>.
|
||||||
|
|
||||||
Copyright (c) 2001-2005 The Apache Software Foundation. All rights reserved.
|
Copyright (c) 2001-2020 The Apache Software Foundation. All rights reserved.
|
|
@ -115,6 +115,8 @@ Other
|
||||||
* LUCENE-8656: Deprecations in FuzzyQuery and get compiler warnings out of
|
* LUCENE-8656: Deprecations in FuzzyQuery and get compiler warnings out of
|
||||||
queryparser code (Alan Woodward, Erick Erickson)
|
queryparser code (Alan Woodward, Erick Erickson)
|
||||||
|
|
||||||
|
* LUCENE-9344: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
|
||||||
|
|
||||||
======================= Lucene 8.6.0 =======================
|
======================= Lucene 8.6.0 =======================
|
||||||
|
|
||||||
API Changes
|
API Changes
|
||||||
|
|
|
@ -19,16 +19,16 @@ For reference, JRE major versions with their corresponding Unicode versions:
|
||||||
* Java 8, Unicode 6.2
|
* Java 8, Unicode 6.2
|
||||||
* Java 9, Unicode 8.0
|
* Java 9, Unicode 8.0
|
||||||
|
|
||||||
In general, whether or not you need to re-index largely depends upon the data that
|
In general, whether you need to re-index largely depends upon the data that
|
||||||
you are searching, and what was changed in any given Unicode version. For example,
|
you are searching, and what was changed in any given Unicode version. For example,
|
||||||
if you are completely sure that your content is limited to the "Basic Latin" range
|
if you are completely sure your content is limited to the "Basic Latin" range
|
||||||
of Unicode, you can safely ignore this.
|
of Unicode, you can safely ignore this.
|
||||||
|
|
||||||
## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
|
## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
|
||||||
|
|
||||||
* `StandardAnalyzer` will return the same results under Java 5 as it did under
|
* `StandardAnalyzer` will return the same results under Java 5 as it did under
|
||||||
Java 1.4. This is because it is largely independent of the runtime JRE for
|
Java 1.4. This is because it is largely independent of the runtime JRE for
|
||||||
Unicode support, (with the exception of lowercasing). However, no changes to
|
Unicode support, (except for lowercasing). However, no changes to
|
||||||
casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
|
casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
|
||||||
using this Analyzer you are NOT affected.
|
using this Analyzer you are NOT affected.
|
||||||
|
|
|
@ -1,33 +1,35 @@
|
||||||
# Apache Lucene Migration Guide
|
# Apache Lucene Migration Guide
|
||||||
|
|
||||||
## NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259) ##
|
## NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259)
|
||||||
|
|
||||||
The factory option name to output the original term was corrected in accordance with its Javadoc.
|
The factory option name to output the original term was corrected in accordance with its Javadoc.
|
||||||
|
|
||||||
## o.a.l.misc.IndexMergeTool defaults changes (LUCENE-9206) ##
|
## o.a.l.misc.IndexMergeTool defaults changes (LUCENE-9206)
|
||||||
|
|
||||||
This command-line tool no longer forceMerges to a single segment. Instead, by
|
This command-line tool no longer forceMerges to a single segment. Instead, by
|
||||||
default it just follows (configurable) merge policy. If you really want to merge
|
default it just follows (configurable) merge policy. If you really want to merge
|
||||||
to a single segment, you can pass -max-segments 1.
|
to a single segment, you can pass -max-segments 1.
|
||||||
|
|
||||||
## o.a.l.util.fst.Builder is renamed FSTCompiler with fluent-style Builder (LUCENE-9089) ##
|
## o.a.l.util.fst.Builder is renamed FSTCompiler with fluent-style Builder (LUCENE-9089)
|
||||||
|
|
||||||
Simply use FSTCompiler instead of the previous Builder. Use either the simple constructor with default settings, or
|
Simply use FSTCompiler instead of the previous Builder. Use either the simple constructor with default settings, or
|
||||||
the FSTCompiler.Builder to tune and tweak any parameter.
|
the FSTCompiler.Builder to tune and tweak any parameter.
|
||||||
|
|
||||||
## Kuromoji user dictionary now forbids illegal segmentation (LUCENE-8933) ##
|
## Kuromoji user dictionary now forbids illegal segmentation (LUCENE-8933)
|
||||||
|
|
||||||
User dictionary now strictly validates if the (concatenated) segment is the same as the surface form. This change avoids
|
User dictionary now strictly validates if the (concatenated) segment is the same as the surface form. This change avoids
|
||||||
unexpected runtime exceptions or behaviours.
|
unexpected runtime exceptions or behaviours.
|
||||||
For example, these entries are not allowed at all and an exception is thrown when loading the dictionary file.
|
For example, these entries are not allowed at all and an exception is thrown when loading the dictionary file.
|
||||||
|
|
||||||
|
```
|
||||||
# concatenated "日本経済新聞" does not match the surface form "日経新聞"
|
# concatenated "日本経済新聞" does not match the surface form "日経新聞"
|
||||||
日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
|
日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
|
||||||
|
|
||||||
# concatenated "日経新聞" does not match the surface form "日本経済新聞"
|
# concatenated "日経新聞" does not match the surface form "日本経済新聞"
|
||||||
日本経済新聞,日経 新聞,ニッケイ シンブン,カスタム名詞
|
日本経済新聞,日経 新聞,ニッケイ シンブン,カスタム名詞
|
||||||
|
```
|
||||||
|
|
||||||
## JapaneseTokenizer no longer emits original (compound) tokens by default when the mode is not NORMAL (LUCENE-9123) ##
|
## JapaneseTokenizer no longer emits original (compound) tokens by default when the mode is not NORMAL (LUCENE-9123)
|
||||||
|
|
||||||
JapaneseTokenizer and JapaneseAnalyzer no longer emits original tokens when discardCompoundToken option is not specified.
|
JapaneseTokenizer and JapaneseAnalyzer no longer emits original tokens when discardCompoundToken option is not specified.
|
||||||
The constructor option has been introduced since Lucene 8.5.0, and the default value is changed to true.
|
The constructor option has been introduced since Lucene 8.5.0, and the default value is changed to true.
|
||||||
|
@ -37,13 +39,15 @@ longer outputs the original token "株式会社" by default. To output original
|
||||||
explicitly set to false. Be aware that if this option is set to false SynonymFilter or SynonymGraphFilter does not work
|
explicitly set to false. Be aware that if this option is set to false SynonymFilter or SynonymGraphFilter does not work
|
||||||
correctly (see LUCENE-9173).
|
correctly (see LUCENE-9173).
|
||||||
|
|
||||||
## Analysis factories now have customizable symbolic names (LUCENE-8778) and need additional no-arg constructor (LUCENE-9281) ##
|
## Analysis factories now have customizable symbolic names (LUCENE-8778) and need additional no-arg constructor (LUCENE-9281)
|
||||||
|
|
||||||
The SPI names for concrete subclasses of TokenizerFactory, TokenFilterFactory, and CharfilterFactory are no longer
|
The SPI names for concrete subclasses of TokenizerFactory, TokenFilterFactory, and CharfilterFactory are no longer
|
||||||
derived from their class name. Instead, each factory must have a static "NAME" field like this:
|
derived from their class name. Instead, each factory must have a static "NAME" field like this:
|
||||||
|
|
||||||
|
```
|
||||||
/** o.a.l.a.standard.StandardTokenizerFactory's SPI name */
|
/** o.a.l.a.standard.StandardTokenizerFactory's SPI name */
|
||||||
public static final String NAME = "standard";
|
public static final String NAME = "standard";
|
||||||
|
```
|
||||||
|
|
||||||
A factory can be resolved/instantiated with its NAME by using methods such as TokenizerFactory#lookupClass(String)
|
A factory can be resolved/instantiated with its NAME by using methods such as TokenizerFactory#lookupClass(String)
|
||||||
or TokenizerFactory#forName(String, Map<String,String>).
|
or TokenizerFactory#forName(String, Map<String,String>).
|
||||||
|
@ -60,35 +64,37 @@ In the future, extensions to Lucene developed on the Java Module System may expo
|
||||||
This constructor is never called by Lucene, so by default it throws a UnsupportedOperationException. User-defined
|
This constructor is never called by Lucene, so by default it throws a UnsupportedOperationException. User-defined
|
||||||
factory classes should implement it in the following way:
|
factory classes should implement it in the following way:
|
||||||
|
|
||||||
|
```
|
||||||
/** Default ctor for compatibility with SPI */
|
/** Default ctor for compatibility with SPI */
|
||||||
public StandardTokenizerFactory() {
|
public StandardTokenizerFactory() {
|
||||||
throw defaultCtorException();
|
throw defaultCtorException();
|
||||||
}
|
}
|
||||||
|
```
|
||||||
|
|
||||||
(`defaultCtorException()` is a protected static helper method)
|
(`defaultCtorException()` is a protected static helper method)
|
||||||
|
|
||||||
## TermsEnum is now fully abstract (LUCENE-8292) ##
|
## TermsEnum is now fully abstract (LUCENE-8292)
|
||||||
|
|
||||||
TermsEnum has been changed to be fully abstract, so non-abstract subclass must implement all it's methods.
|
TermsEnum has been changed to be fully abstract, so non-abstract subclass must implement all it's methods.
|
||||||
Non-Performance critical TermsEnums can use BaseTermsEnum as a base class instead. The change was motivated
|
Non-Performance critical TermsEnums can use BaseTermsEnum as a base class instead. The change was motivated
|
||||||
by several performance issues with FilterTermsEnum that caused significant slowdowns and massive memory consumption due
|
by several performance issues with FilterTermsEnum that caused significant slowdowns and massive memory consumption due
|
||||||
to not delegating all method from TermsEnum. See LUCENE-8292 and LUCENE-8662
|
to not delegating all method from TermsEnum. See LUCENE-8292 and LUCENE-8662
|
||||||
|
|
||||||
## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream removed ##
|
## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream removed
|
||||||
|
|
||||||
RAM-based directory implementation have been removed. (LUCENE-8474).
|
RAM-based directory implementation have been removed. (LUCENE-8474).
|
||||||
ByteBuffersDirectory can be used as a RAM-resident replacement, although it
|
ByteBuffersDirectory can be used as a RAM-resident replacement, although it
|
||||||
is discouraged in favor of the default memory-mapped directory.
|
is discouraged in favor of the default memory-mapped directory.
|
||||||
|
|
||||||
|
|
||||||
## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ##
|
## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014)
|
||||||
|
|
||||||
SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
|
SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
|
||||||
distance)). Payload factor calculation is performed by PayloadDecoder in the
|
distance)). Payload factor calculation is performed by PayloadDecoder in the
|
||||||
queries module
|
queries module
|
||||||
|
|
||||||
|
|
||||||
## Scorer must produce positive scores (LUCENE-7996) ##
|
## Scorer must produce positive scores (LUCENE-7996)
|
||||||
|
|
||||||
Scorers are no longer allowed to produce negative scores. If you have custom
|
Scorers are no longer allowed to produce negative scores. If you have custom
|
||||||
query implementations, you should make sure their score formula may never produce
|
query implementations, you should make sure their score formula may never produce
|
||||||
|
@ -98,21 +104,23 @@ As a side-effect of this change, negative boosts are now rejected and
|
||||||
FunctionScoreQuery maps negative values to 0.
|
FunctionScoreQuery maps negative values to 0.
|
||||||
|
|
||||||
|
|
||||||
## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) ##
|
## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099)
|
||||||
|
|
||||||
Instead use FunctionScoreQuery and a DoubleValuesSource implementation. BoostedQuery
|
Instead use FunctionScoreQuery and a DoubleValuesSource implementation. BoostedQuery
|
||||||
and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and
|
and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and
|
||||||
FunctionScoreQuery.boostByQuery(). To replace more complex calculations in
|
FunctionScoreQuery.boostByQuery(). To replace more complex calculations in
|
||||||
CustomScoreQuery, use the lucene-expressions module:
|
CustomScoreQuery, use the lucene-expressions module:
|
||||||
|
|
||||||
|
```
|
||||||
SimpleBindings bindings = new SimpleBindings();
|
SimpleBindings bindings = new SimpleBindings();
|
||||||
bindings.add("score", DoubleValuesSource.SCORES);
|
bindings.add("score", DoubleValuesSource.SCORES);
|
||||||
bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
|
bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
|
||||||
bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
|
bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
|
||||||
Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
|
Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
|
||||||
FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
|
FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
|
||||||
|
```
|
||||||
|
|
||||||
## Index options can no longer be changed dynamically (LUCENE-8134) ##
|
## Index options can no longer be changed dynamically (LUCENE-8134)
|
||||||
|
|
||||||
Changing index options on the fly is now going to result into an
|
Changing index options on the fly is now going to result into an
|
||||||
IllegalArgumentException. If a field is indexed
|
IllegalArgumentException. If a field is indexed
|
||||||
|
@ -120,62 +128,64 @@ IllegalArgumentException. If a field is indexed
|
||||||
the same index options for that field.
|
the same index options for that field.
|
||||||
|
|
||||||
|
|
||||||
## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) ##
|
## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242)
|
||||||
|
|
||||||
Instead use IndexSearcher.createWeight(), rewriting the query first, and using
|
Instead use IndexSearcher.createWeight(), rewriting the query first, and using
|
||||||
a boost of 1f.
|
a boost of 1f.
|
||||||
|
|
||||||
## Memory codecs removed (LUCENE-8267) ##
|
## Memory codecs removed (LUCENE-8267)
|
||||||
|
|
||||||
Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues).
|
Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues).
|
||||||
|
|
||||||
## Direct doc-value format removed (LUCENE-8917) ##
|
## Direct doc-value format removed (LUCENE-8917)
|
||||||
|
|
||||||
The "Direct" doc-value format has been removed from the codebase.
|
The "Direct" doc-value format has been removed from the codebase.
|
||||||
|
|
||||||
## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) ##
|
## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144)
|
||||||
|
|
||||||
Caching everything is discouraged as it disables the ability to skip non-interesting documents.
|
Caching everything is discouraged as it disables the ability to skip non-interesting documents.
|
||||||
ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config.
|
ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config.
|
||||||
|
|
||||||
## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444) ##
|
## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444)
|
||||||
|
|
||||||
To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument
|
To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument
|
||||||
to the constructor
|
to the constructor
|
||||||
|
|
||||||
## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved ##
|
## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved
|
||||||
|
|
||||||
English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the
|
English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the
|
||||||
analysis-common module
|
analysis-common module
|
||||||
|
|
||||||
## TopDocs.maxScore removed ##
|
## TopDocs.maxScore removed
|
||||||
|
|
||||||
TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have
|
TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have
|
||||||
an option to compute the maximum score when sorting by field. If you need to
|
an option to compute the maximum score when sorting by field. If you need to
|
||||||
know the maximum score for a query, the recommended approach is to run a
|
know the maximum score for a query, the recommended approach is to run a
|
||||||
separate query:
|
separate query:
|
||||||
|
|
||||||
|
```
|
||||||
TopDocs topHits = searcher.search(query, 1);
|
TopDocs topHits = searcher.search(query, 1);
|
||||||
float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
|
float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
|
||||||
|
```
|
||||||
|
|
||||||
Thanks to other optimizations that were added to Lucene 8, this query will be
|
Thanks to other optimizations that were added to Lucene 8, this query will be
|
||||||
able to efficiently select the top-scoring document without having to visit
|
able to efficiently select the top-scoring document without having to visit
|
||||||
all matches.
|
all matches.
|
||||||
|
|
||||||
## TopFieldCollector always assumes fillFields=true ##
|
## TopFieldCollector always assumes fillFields=true
|
||||||
|
|
||||||
Because filling sort values doesn't have a significant overhead, the fillFields
|
Because filling sort values doesn't have a significant overhead, the fillFields
|
||||||
option has been removed from TopFieldCollector factory methods. Everything
|
option has been removed from TopFieldCollector factory methods. Everything
|
||||||
behaves as if it was set to true.
|
behaves as if it was set to true.
|
||||||
|
|
||||||
## TopFieldCollector no longer takes a trackDocScores option ##
|
## TopFieldCollector no longer takes a trackDocScores option
|
||||||
|
|
||||||
Computing scores at collection time is less efficient than running a second
|
Computing scores at collection time is less efficient than running a second
|
||||||
request in order to only compute scores for documents that made it to the top
|
request in order to only compute scores for documents that made it to the top
|
||||||
hits. As a consequence, the trackDocScores option has been removed and can be
|
hits. As a consequence, the trackDocScores option has been removed and can be
|
||||||
replaced with the new TopFieldCollector#populateScores helper method.
|
replaced with the new TopFieldCollector#populateScores helper method.
|
||||||
|
|
||||||
## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long ##
|
## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long
|
||||||
|
|
||||||
Lucene 8 received optimizations for collection of top-k matches by not visiting
|
Lucene 8 received optimizations for collection of top-k matches by not visiting
|
||||||
all matches. However these optimizations won't help if all matches still need
|
all matches. However these optimizations won't help if all matches still need
|
||||||
|
@ -185,37 +195,36 @@ accurately up to 1,000, and Topdocs.totalHits was changed from a long to an
|
||||||
object that says whether the hit count is accurate or a lower bound of the
|
object that says whether the hit count is accurate or a lower bound of the
|
||||||
actual hit count.
|
actual hit count.
|
||||||
|
|
||||||
## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated ##
|
## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
|
||||||
|
|
||||||
This RAM-based directory implementation is an old piece of code that uses inefficient
|
This RAM-based directory implementation is an old piece of code that uses inefficient
|
||||||
thread synchronization primitives and can be confused as "faster" than the NIO-based
|
thread synchronization primitives and can be confused as "faster" than the NIO-based
|
||||||
MMapDirectory. It is deprecated and scheduled for removal in future versions of
|
MMapDirectory. It is deprecated and scheduled for removal in future versions of
|
||||||
Lucene. (LUCENE-8467, LUCENE-8438)
|
Lucene. (LUCENE-8467, LUCENE-8438)
|
||||||
|
|
||||||
## LeafCollector.setScorer() now takes a Scorable rather than a Scorer ##
|
## LeafCollector.setScorer() now takes a Scorable rather than a Scorer
|
||||||
|
|
||||||
Scorer has a number of methods that should never be called from Collectors, for example
|
Scorer has a number of methods that should never be called from Collectors, for example
|
||||||
those that advance the underlying iterators. To hide these, LeafCollector.setScorer()
|
those that advance the underlying iterators. To hide these, LeafCollector.setScorer()
|
||||||
now takes a Scorable, an abstract class that Scorers can extend, with methods
|
now takes a Scorable, an abstract class that Scorers can extend, with methods
|
||||||
docId() and score() (LUCENE-6228)
|
docId() and score() (LUCENE-6228)
|
||||||
|
|
||||||
## Scorers must have non-null Weights ##
|
## Scorers must have non-null Weights
|
||||||
|
|
||||||
If a custom Scorer implementation does not have an associated Weight, it can probably
|
If a custom Scorer implementation does not have an associated Weight, it can probably
|
||||||
be replaced with a Scorable instead.
|
be replaced with a Scorable instead.
|
||||||
|
|
||||||
## Suggesters now return Long instead of long for weight() during indexing, and double
|
## Suggesters now return Long instead of long for weight() during indexing, and double instead of long at suggest time
|
||||||
instead of long at suggest time ##
|
|
||||||
|
|
||||||
Most code should just require recompilation, though possibly requiring some added casts.
|
Most code should just require recompilation, though possibly requiring some added casts.
|
||||||
|
|
||||||
## TokenStreamComponents is now final ##
|
## TokenStreamComponents is now final
|
||||||
|
|
||||||
Instead of overriding TokenStreamComponents#setReader() to customise analyzer
|
Instead of overriding TokenStreamComponents#setReader() to customise analyzer
|
||||||
initialisation, you should now pass a Consumer<Reader> instance to the
|
initialisation, you should now pass a Consumer<Reader> instance to the
|
||||||
TokenStreamComponents constructor.
|
TokenStreamComponents constructor.
|
||||||
|
|
||||||
## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed ##
|
## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed
|
||||||
|
|
||||||
LowerCaseTokenizer combined tokenization and filtering in a way that broke token
|
LowerCaseTokenizer combined tokenization and filtering in a way that broke token
|
||||||
normalization, so they have been removed. Instead, use a LetterTokenizer followed by
|
normalization, so they have been removed. Instead, use a LetterTokenizer followed by
|
||||||
|
@ -231,12 +240,12 @@ use a TokenFilter chain as you would with any other Tokenizer.
|
||||||
Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively
|
Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively
|
||||||
in order to support ToParent/ToChildBlockJoinQuery.
|
in order to support ToParent/ToChildBlockJoinQuery.
|
||||||
|
|
||||||
## MultiTermAwareComponent replaced by CharFilterFactory#normalize() and TokenFilterFactory#normalize() ##
|
## MultiTermAwareComponent replaced by CharFilterFactory#normalize() and TokenFilterFactory#normalize()
|
||||||
|
|
||||||
Normalization is now type-safe, with CharFilterFactory#normalize() returning a Reader and
|
Normalization is now type-safe, with CharFilterFactory#normalize() returning a Reader and
|
||||||
TokenFilterFactory#normalize() returning a TokenFilter.
|
TokenFilterFactory#normalize() returning a TokenFilter.
|
||||||
|
|
||||||
## k1+1 constant factor removed from BM25 similarity numerator (LUCENE-8563) ##
|
## k1+1 constant factor removed from BM25 similarity numerator (LUCENE-8563)
|
||||||
|
|
||||||
Scores computed by the BM25 similarity are lower than previously as the k1+1
|
Scores computed by the BM25 similarity are lower than previously as the k1+1
|
||||||
constant factor was removed from the numerator of the scoring formula.
|
constant factor was removed from the numerator of the scoring formula.
|
||||||
|
@ -244,17 +253,18 @@ Ordering of results is preserved unless scores are computed from multiple
|
||||||
fields using different similarities. The previous behaviour is now exposed
|
fields using different similarities. The previous behaviour is now exposed
|
||||||
by the LegacyBM25Similarity class which can be found in the lucene-misc jar.
|
by the LegacyBM25Similarity class which can be found in the lucene-misc jar.
|
||||||
|
|
||||||
## IndexWriter#maxDoc()/#numDocs() removed in favor of IndexWriter#getDocStats() ##
|
## IndexWriter#maxDoc()/#numDocs() removed in favor of IndexWriter#getDocStats()
|
||||||
|
|
||||||
IndexWriter#getDocStats() should be used instead of #maxDoc() / #numDocs() which offers a consistent
|
IndexWriter#getDocStats() should be used instead of #maxDoc() / #numDocs() which offers a consistent
|
||||||
view on document stats. Previously calling two methods in order ot get point in time stats was subject
|
view on document stats. Previously calling two methods in order ot get point in time stats was subject
|
||||||
to concurrent changes.
|
to concurrent changes.
|
||||||
|
|
||||||
## maxClausesCount moved from BooleanQuery To IndexSearcher (LUCENE-8811) ##
|
## maxClausesCount moved from BooleanQuery To IndexSearcher (LUCENE-8811)
|
||||||
|
|
||||||
IndexSearcher now performs max clause count checks on all types of queries (including BooleanQueries).
|
IndexSearcher now performs max clause count checks on all types of queries (including BooleanQueries).
|
||||||
This led to a logical move of the clauses count from BooleanQuery to IndexSearcher.
|
This led to a logical move of the clauses count from BooleanQuery to IndexSearcher.
|
||||||
|
|
||||||
## TopDocs.merge shall no longer allow setting of shard indices ##
|
## TopDocs.merge shall no longer allow setting of shard indices
|
||||||
|
|
||||||
TopDocs.merge's API has been changed to stop allowing passing in a parameter to indicate if it should
|
TopDocs.merge's API has been changed to stop allowing passing in a parameter to indicate if it should
|
||||||
set shard indices for hits as they are seen during the merge process. This is done to simplify the API
|
set shard indices for hits as they are seen during the merge process. This is done to simplify the API
|
||||||
|
@ -262,7 +272,7 @@ to be more dynamic in terms of passing in custom tie breakers.
|
||||||
If shard indices are to be used for tie breaking docs with equal scores during TopDocs.merge, then it is
|
If shard indices are to be used for tie breaking docs with equal scores during TopDocs.merge, then it is
|
||||||
mandatory that the input ScoreDocs have their shard indices set to valid values prior to calling TopDocs.merge
|
mandatory that the input ScoreDocs have their shard indices set to valid values prior to calling TopDocs.merge
|
||||||
|
|
||||||
## TopDocsCollector Shall Throw IllegalArgumentException For Malformed Arguments ##
|
## TopDocsCollector Shall Throw IllegalArgumentException For Malformed Arguments
|
||||||
|
|
||||||
TopDocsCollector shall no longer return an empty TopDocs for malformed arguments.
|
TopDocsCollector shall no longer return an empty TopDocs for malformed arguments.
|
||||||
Rather, an IllegalArgumentException shall be thrown. This is introduced for better
|
Rather, an IllegalArgumentException shall be thrown. This is introduced for better
|
|
@ -14,5 +14,5 @@ implementing Lucene (document size, number of documents, and number of
|
||||||
hits retrieved to name a few). The benchmarks page has some information
|
hits retrieved to name a few). The benchmarks page has some information
|
||||||
related to performance on particular platforms.
|
related to performance on particular platforms.
|
||||||
|
|
||||||
*To build Apache Lucene from source, refer to the `BUILD.txt` file in
|
*To build Apache Lucene from the source, refer to the `BUILD.txt` file in
|
||||||
the distribution directory.*
|
the distribution directory.*
|
|
@ -32,9 +32,9 @@
|
||||||
excludes="poms/**,**/*-src.jar,**/*-javadoc.jar"
|
excludes="poms/**,**/*-src.jar,**/*-javadoc.jar"
|
||||||
/>
|
/>
|
||||||
<patternset id="binary.root.dist.patterns"
|
<patternset id="binary.root.dist.patterns"
|
||||||
includes="LICENSE.txt,NOTICE.txt,README.txt,
|
includes="LICENSE.txt,NOTICE.txt,README.md,
|
||||||
MIGRATE.txt,JRE_VERSION_MIGRATION.txt,
|
MIGRATE.md,JRE_VERSION_MIGRATION.md,
|
||||||
SYSTEM_REQUIREMENTS.txt,
|
SYSTEM_REQUIREMENTS.md,
|
||||||
CHANGES.txt,
|
CHANGES.txt,
|
||||||
**/lib/*.jar,
|
**/lib/*.jar,
|
||||||
licenses/**,
|
licenses/**,
|
||||||
|
@ -229,8 +229,8 @@
|
||||||
</xslt>
|
</xslt>
|
||||||
|
|
||||||
<markdown todir="${javadoc.dir}">
|
<markdown todir="${javadoc.dir}">
|
||||||
<fileset dir="." includes="MIGRATE.txt,JRE_VERSION_MIGRATION.txt,SYSTEM_REQUIREMENTS.txt"/>
|
<fileset dir="." includes="MIGRATE.md,JRE_VERSION_MIGRATION.md,SYSTEM_REQUIREMENTS.md"/>
|
||||||
<globmapper from="*.txt" to="*.html"/>
|
<globmapper from="*.md" to="*.html"/>
|
||||||
</markdown>
|
</markdown>
|
||||||
|
|
||||||
<copy todir="${javadoc.dir}">
|
<copy todir="${javadoc.dir}">
|
||||||
|
|
Loading…
Reference in New Issue