Commit Graph

35682 Commits

Author SHA1 Message Date
Robert Muir 6ac311068f
LUCENE-10128: avoid costly reflection in SparseFixedBitSet ctor
Seems that VectorFormat merge creates A LOT of these bitsets. We don't
need to do any fancy reflection here via shallowSizeOf(Object), when we
can call sizeOf(long[]) which is fast.

We may want to revisit this RAMUsageEstimator api in the future to
prevent traps like this.
2021-09-28 09:39:36 -04:00
Adrien Grand 7357bdc272
LUCENE-10123: Handling of singletons in DocValuesConsumer. (#320)
This avoids double wrapping of doc values in `Lucene90DocValuesConsumer`.
2021-09-28 08:54:46 +02:00
Greg Miller 1ebd193fbe
Move CHANGES entry for LUCENE-10070 under 8.11 after backport (#323) 2021-09-27 12:15:52 -07:00
Uwe Schindler 849d5fc1ac
LUCENE-10125: Optimize primitive writes in OutputStreamIndexOutput (#321) 2021-09-27 19:04:03 +02:00
Julie Tibshirani eaa421094d
LUCENE-10109: Bump default beam width for HNSW (#312)
Lucene90HnswVectorsFormat has a default 'beam width' of 16. This is quite low
and produces poor recall on typical-sized datasets.

This commit bumps it to 100. This new default tries to balance good search
performance with indexing speed. Most runs in ann-benchmarks set the parameter
between ~400 and 800, but they are heavily optimizing search over index speed.
2021-09-24 18:02:34 -07:00
Greg Miller eb44d1e6ad
Add slightly more language in the README Contributing section (#318) 2021-09-24 12:06:06 -07:00
Nhat Nguyen 7390d1af51
LUCENE-10119: Do not set single sort with search after (#317)
We should not set single sort when the search_after is non-null; 
otherwise, we will incorrectly skip documents whose values are equal to
the value from the search_after and docIDs are greater than the docID
from the search_after.
2021-09-23 13:10:17 -04:00
Uwe Schindler fc475360a8 Only pass "--illegal-access=deny" up to JDK-15, later versions deprecate the option and default to "deny" 2021-09-22 19:41:59 +02:00
Lu Xugang ed7fb8dea0
LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter (#316) 2021-09-22 14:25:40 +02:00
Lu Xugang a7bddfaacc
LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter (#307) 2021-09-22 07:44:30 +02:00
Chris Hegarty a7578709a6
LUCENE-10115: Add a fuzzy parsing extension point for custom query parsers
This commit adds the QueryParserBase::getFuzzyDistance protected method, which 
can be overridden by subclasses to provide customisation of how the similarity distance 
is determined. The default implementation retains the current behaviour.
2021-09-21 13:25:09 +01:00
Julie Tibshirani b2a04a4bb4 LUCENE-10069: Adjust TestKnnVectorQuery#testRandom to stop failures
The test fails randomly because HNSW can sometimes miss results when k is close
to the number of total docs. While we wait for a fix, this commit decreases k to
prevent failures.
2021-09-20 14:16:47 -07:00
Uwe Schindler 5871ea7972
LUCENE-10112: Improve LZ4 Compression performance with direct primitive read/writes (#310)
Co-authored-by: Tim Brooks <tim@timbrooks.org>
2021-09-20 19:12:38 +02:00
Christine Poerschke 57524c6a5e
LUCENE-9809: replace 'master' with 'main' in release wizard (#305) 2021-09-20 17:51:41 +01:00
Uwe Schindler c57d6e5f8c
LUCENE-10113: Use VarHandles to access int/long/short types in byte arrays (e.g. ByteArrayDataInput) (#308)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-09-20 15:37:33 +02:00
Adrien Grand 4bcd64c5ed LUCENE-9620: Fix test bug. 2021-09-20 09:49:13 +02:00
Uwe Schindler 075d801abe
LUCENE-10114: Remove unused byte order mark in Lucene90PostingsWriter (#309)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-09-20 08:37:05 +02:00
Jim Ferenczi ccf0d5404d
LUCENE-10110: MultiCollector should conditionally wrap single leaf collector (#303)
MultiCollector should wrap single leaf collector that wants to skip low-scoring hits
 but the combined score mode doesn't allow it.
2021-09-20 07:26:51 +02:00
Tomoko Uchida 6c1e5920d8 LUCENE-10102: do not call incrementToken() against already consumed input stream. 2021-09-20 10:58:39 +09:00
Robert Muir 8b95e51d70
Add additional docs refs (nightly, build system help/) to README.md (#302) 2021-09-19 20:24:13 -04:00
Uwe Schindler f3c3b90e35
LUCENE-9047: fix typo in javadocs (still referred to big endian) 2021-09-19 13:51:51 +02:00
Tomoko Uchida 5dfbef313c LUCENE-10102: Fix JapaneseCompletionFilter javadoc 2021-09-18 14:43:03 +09:00
goankur deff5a1f5a
LUCENE-10070: Skip deleted documents during facet counting for all documents (#293) 2021-09-17 10:35:44 -07:00
Tomoko Uchida 4e86df96c0
LUCENE-10102: Add JapaneseCompletionFilter for Input Method-aware auto-completion (#297)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-09-17 22:37:12 +09:00
Dawid Weiss de45b68c90 LUCENE-9448, LUCENE-9990: fix Luke's launcher task. 2021-09-16 08:49:26 +02:00
Nhat Nguyen b7a286dd69
LUCENE-10106: Sort optimization wrongly skip first docs (#300)
The first documents of subsequent segments are mistakenly skipped when 
sort optimization is enabled. We should initialize maxDocVisited in
NumericComparator to -1 instead of 0.
2021-09-15 09:21:59 -04:00
Uwe Schindler 1586933b18 Merge branch 'main' of https://gitbox.apache.org/repos/asf/lucene into main 2021-09-15 01:19:42 +02:00
Uwe Schindler 3c6d4a00cd LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2 2021-09-15 01:19:17 +02:00
Alan Woodward 26093735cc
LUCENE-8638: Expressions haversin() method should continue to return its value in km (#299)
SloppyMath had a deprecated haversin() function that returned its values in
km, which has been replaced by a haversinMeters() function that is explicit
about its units. As part of removing this function, we changed the expressions
module haversin function to point instead to haversinMeters. However, this
may silently change the behaviour of expressions on upgrade.

This commit instead adds a haversinKilometers method to the expressions
module and maps the haversin function to it. It also adds a new
haversinMeters expression function to be more explicit for future users.
2021-09-14 14:01:10 +01:00
Uwe Schindler 3802bdc686
LUCENE-10101: Use getField() instead of getDeclaredField() to minimize security impact by analysis SPI discovery (#298) 2021-09-14 10:31:46 +02:00
Jim Ferenczi 19537578dd
LUCENE-10089: Disable numeric sort optimization early (#291)
This commit moves the responsibility to disable
the numeric sort optimization on comparators to the SortField.
This way we don't need to apply the logic on every top field collectors.
2021-09-13 07:31:43 +02:00
Robert Muir 56968b762a
LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. (#294)
LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns.

We can't do this out of box with the analyzer, due to incompatible
licenses. But we can make it easy on the user to do this, by linking to
repo that has sample code, documentation, and the required data files.
2021-09-12 12:55:51 -04:00
Robert Muir 24aa45dc3e
LUCENE-10096: Tamil Analyzer (#292)
Add Tamil analyzer based on snowball stemmer and TamilNLP stopwords
2021-09-10 21:02:11 -04:00
Robert Muir 8bce765218
LUCENE-10095: Nepali Analyzer (#290)
Add Nepali analyzer based on snowball stemmer and NLTK stopwords
2021-09-10 20:45:23 -04:00
Alan Woodward cc8c4283dd LUCENE-10094: Fix test bug 2021-09-10 16:32:33 +01:00
Alan Woodward 1bb52859c8
LUCENE-10094: Delegate count() from CachingWrapperWeight (#289)
CachingWrapperWeight always returns -1 from its count() method, which
disables the fast path for TermQuery, MatchAllDocQuery, etc, when running
IndexSearcher.count(Query). This commit makes it delegate the method
to its wrapped Weight.
2021-09-10 10:45:20 +01:00
zacharymorn 7f8607b59e
LUCENE-9662: Update concurrent index checking usage instructions and default thread count to CPU cores (#281) 2021-09-09 20:20:42 -07:00
Mike McCandless 42242b1745 add a small test to TestVersion to confirm we handle non-floating-point release numbers correctly 2021-09-09 10:43:53 -04:00
Mayya Sharipova cc58c51941
LUCENE-10089 Disable numeric sort optim when needed (#286)
Add a method to SortField that allows to enable/ disable numeric 
sort optimization with points, which is enabled by default from 9.0.
2021-09-09 10:22:42 -04:00
Mike McCandless ee0695eda8 LUCENE-10092: fix test bug by forceMerging the index down to one segment 2021-09-08 14:01:10 -04:00
Adrien Grand 7eb35be045
LUCENE-10087: Validate number of dimensions and bytes per dimension for numeric SortFields. (#283) 2021-09-07 13:28:39 +02:00
Mayya Sharipova bc161e6dcc
LUCENE-10040 Correct TestHnswGraph.testSearchWithAcceptOrds (#277)
If we set numSeed = 10, this test fails sometimes  because it may mark
expected results docs (from 0 to 9) as deleted which don't end up
being retrieved, resulting in a low recall

- set numSeed to 10 to ensure 10 results are returned
- add startIndex paramenter to createRandomAcceptOrds that allows
  documents before startIndex to be NOT deleted
- use startIndex equal to 10 for createRandomAcceptOrds

Relates to #239
2021-09-06 06:56:15 -04:00
Jim Ferenczi 4df8d641ac
LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces (#272)
This change ensures that we don't skip consecutive whitespaces without checking the maximum backtrace gap.
2021-09-06 08:46:39 +02:00
Mike McCandless 34f37d0d43 LUCENE-10035: move CHANGES.txt entry from 9.0 to 8.10 2021-09-03 10:21:28 -04:00
Adrien Grand b3ce44cd0d LUCENE-9620: Implement AssertingWeight#count. 2021-09-03 14:44:07 +02:00
Adrien Grand 4bb018e904 LUCENE-9620: Fix TestTermQuery failure. 2021-09-03 10:48:01 +02:00
Adrien Grand de661d6535 LUCENE-9620: Address profiling test failures. 2021-09-03 10:48:01 +02:00
zacharymorn d4e4fe22b1
Revert "LUCENE-9959: Add non thread local based API for term vector reader usage (#180)" (#280)
This reverts commit 180cfa241b.
2021-09-03 00:31:18 -07:00
Gautam Worah 44e9f5de53
LUCENE-9620 Add Weight#count(LeafReaderContext) (#242)
Add a default implementation in Weight.java and add sample faster
implementations in MatchAllDocsQuery, MatchNoDocsQuery, TermQuery

Add tests for BooleanQuery and TermQuery

Co-authored-by: Gautam Worah <gauworah@amazon.com>
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-09-03 09:09:38 +02:00
Houston Putman 059d06cec7
Fix gpg key download in release wizard. (#279)
Old URL to check the apache id gpg key is no longer available.
2021-09-02 18:08:57 -04:00