Commit Graph

35457 Commits

Author SHA1 Message Date
Nhat Nguyen b7a286dd69
LUCENE-10106: Sort optimization wrongly skip first docs (#300)
The first documents of subsequent segments are mistakenly skipped when 
sort optimization is enabled. We should initialize maxDocVisited in
NumericComparator to -1 instead of 0.
2021-09-15 09:21:59 -04:00
Uwe Schindler 1586933b18 Merge branch 'main' of https://gitbox.apache.org/repos/asf/lucene into main 2021-09-15 01:19:42 +02:00
Uwe Schindler 3c6d4a00cd LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2 2021-09-15 01:19:17 +02:00
Alan Woodward 26093735cc
LUCENE-8638: Expressions haversin() method should continue to return its value in km (#299)
SloppyMath had a deprecated haversin() function that returned its values in
km, which has been replaced by a haversinMeters() function that is explicit
about its units. As part of removing this function, we changed the expressions
module haversin function to point instead to haversinMeters. However, this
may silently change the behaviour of expressions on upgrade.

This commit instead adds a haversinKilometers method to the expressions
module and maps the haversin function to it. It also adds a new
haversinMeters expression function to be more explicit for future users.
2021-09-14 14:01:10 +01:00
Uwe Schindler 3802bdc686
LUCENE-10101: Use getField() instead of getDeclaredField() to minimize security impact by analysis SPI discovery (#298) 2021-09-14 10:31:46 +02:00
Jim Ferenczi 19537578dd
LUCENE-10089: Disable numeric sort optimization early (#291)
This commit moves the responsibility to disable
the numeric sort optimization on comparators to the SortField.
This way we don't need to apply the logic on every top field collectors.
2021-09-13 07:31:43 +02:00
Robert Muir 56968b762a
LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. (#294)
LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns.

We can't do this out of box with the analyzer, due to incompatible
licenses. But we can make it easy on the user to do this, by linking to
repo that has sample code, documentation, and the required data files.
2021-09-12 12:55:51 -04:00
Robert Muir 24aa45dc3e
LUCENE-10096: Tamil Analyzer (#292)
Add Tamil analyzer based on snowball stemmer and TamilNLP stopwords
2021-09-10 21:02:11 -04:00
Robert Muir 8bce765218
LUCENE-10095: Nepali Analyzer (#290)
Add Nepali analyzer based on snowball stemmer and NLTK stopwords
2021-09-10 20:45:23 -04:00
Alan Woodward cc8c4283dd LUCENE-10094: Fix test bug 2021-09-10 16:32:33 +01:00
Alan Woodward 1bb52859c8
LUCENE-10094: Delegate count() from CachingWrapperWeight (#289)
CachingWrapperWeight always returns -1 from its count() method, which
disables the fast path for TermQuery, MatchAllDocQuery, etc, when running
IndexSearcher.count(Query). This commit makes it delegate the method
to its wrapped Weight.
2021-09-10 10:45:20 +01:00
zacharymorn 7f8607b59e
LUCENE-9662: Update concurrent index checking usage instructions and default thread count to CPU cores (#281) 2021-09-09 20:20:42 -07:00
Mike McCandless 42242b1745 add a small test to TestVersion to confirm we handle non-floating-point release numbers correctly 2021-09-09 10:43:53 -04:00
Mayya Sharipova cc58c51941
LUCENE-10089 Disable numeric sort optim when needed (#286)
Add a method to SortField that allows to enable/ disable numeric 
sort optimization with points, which is enabled by default from 9.0.
2021-09-09 10:22:42 -04:00
Mike McCandless ee0695eda8 LUCENE-10092: fix test bug by forceMerging the index down to one segment 2021-09-08 14:01:10 -04:00
Adrien Grand 7eb35be045
LUCENE-10087: Validate number of dimensions and bytes per dimension for numeric SortFields. (#283) 2021-09-07 13:28:39 +02:00
Mayya Sharipova bc161e6dcc
LUCENE-10040 Correct TestHnswGraph.testSearchWithAcceptOrds (#277)
If we set numSeed = 10, this test fails sometimes  because it may mark
expected results docs (from 0 to 9) as deleted which don't end up
being retrieved, resulting in a low recall

- set numSeed to 10 to ensure 10 results are returned
- add startIndex paramenter to createRandomAcceptOrds that allows
  documents before startIndex to be NOT deleted
- use startIndex equal to 10 for createRandomAcceptOrds

Relates to #239
2021-09-06 06:56:15 -04:00
Jim Ferenczi 4df8d641ac
LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces (#272)
This change ensures that we don't skip consecutive whitespaces without checking the maximum backtrace gap.
2021-09-06 08:46:39 +02:00
Mike McCandless 34f37d0d43 LUCENE-10035: move CHANGES.txt entry from 9.0 to 8.10 2021-09-03 10:21:28 -04:00
Adrien Grand b3ce44cd0d LUCENE-9620: Implement AssertingWeight#count. 2021-09-03 14:44:07 +02:00
Adrien Grand 4bb018e904 LUCENE-9620: Fix TestTermQuery failure. 2021-09-03 10:48:01 +02:00
Adrien Grand de661d6535 LUCENE-9620: Address profiling test failures. 2021-09-03 10:48:01 +02:00
zacharymorn d4e4fe22b1
Revert "LUCENE-9959: Add non thread local based API for term vector reader usage (#180)" (#280)
This reverts commit 180cfa241b.
2021-09-03 00:31:18 -07:00
Gautam Worah 44e9f5de53
LUCENE-9620 Add Weight#count(LeafReaderContext) (#242)
Add a default implementation in Weight.java and add sample faster
implementations in MatchAllDocsQuery, MatchNoDocsQuery, TermQuery

Add tests for BooleanQuery and TermQuery

Co-authored-by: Gautam Worah <gauworah@amazon.com>
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-09-03 09:09:38 +02:00
Houston Putman 059d06cec7
Fix gpg key download in release wizard. (#279)
Old URL to check the apache id gpg key is no longer available.
2021-09-02 18:08:57 -04:00
Mayya Sharipova 54179e9372
LUCENE-10063 Correct BaseKnnVectorsFormatTestCase.testRandomWithUpdatesAndGraph (#278)
- Make sure that k > 0 for knn search
- Make sure that k doesn't exceed the number of live docs

Relates to #262
2021-09-02 16:23:31 -04:00
Adrien Grand eb2509c846 LUCENE-10035: Fix CHANGES entry. 2021-09-02 18:37:04 +02:00
Robert Muir b0611a14d0
LUCENE-10083: add CHANGES entry for Telugu analyzer 2021-09-02 12:20:34 -04:00
vinodrenu 544dbbea46
LUCENE-10083: Analyzer and stemmer for Telugu language (#275)
* initial version of Telugu analyzer
* made entries for factories and added few more terms in stemmer
* added two more terms
* added few mote terms
* added long to short vowel conversion
* added test cases
* applied code formatting rules
* fixed unclosed p tag in javadoc
* spotlessApply removed the closing p tag
2021-09-02 12:00:13 -04:00
Gautam Worah 1036c708db
LUCENE-9476: Add getBulkPath API to DirectoryTaxonomyReader for faster ordinal -> FacetLabel lookup (#179)
Co-authored-by: Gautam Worah <gauworah@amazon.com>
2021-09-02 07:54:31 -04:00
zacharymorn 34232430f2
LUCENE-9662: fix test failure from merging away soft-deletes (#276) 2021-09-01 22:18:29 -07:00
Michael Sokolov ee7a719dd8 LUCENE-10082: add detail to schema inconsistency error messages 2021-09-01 23:11:35 +00:00
Michael Sokolov e3e54c95c9
LUCENE-10063: test fixes relating to SimpleTextKnnVectorsReader (#273) 2021-09-01 08:19:11 -04:00
zacharymorn 424192e170
LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments (#128) 2021-08-31 19:24:14 -07:00
Michael Sokolov 9c7f0d45ee
LUCENE-10063: implement SimpleTextKnnvectorsReader.search 2021-08-31 13:55:13 -04:00
wuda 6ade29c71a
LUCENE-10035: Simple text codec add multi level skip list data (#224) 2021-08-30 15:27:42 +02:00
Dawid Weiss e470535072
LUCENE-9654: Expressions module gramar antlr code regeneration (#269) 2021-08-27 12:47:19 +02:00
Greg Miller 3b3f9600c2
Fix a DrillSideways unit test I broke when adding more tests in LUCENE-10060 (#268) 2021-08-26 14:44:52 -07:00
Greg Miller dbf7e1865f
LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached (#261) 2021-08-26 06:06:54 -07:00
Adrien Grand f1fdd2465c
LUCENE-9917: Smaller block sizes for BEST_SPEED. (#257)
This reduces the block size for BEST_SPEED in order to trade some compression
ratio in exchange for better retrieval speed.
2021-08-26 15:04:51 +02:00
Dawid Weiss f6e3b08ae9
LUCENE-10072: Regenerate FST dictionaries after LUCENE-9047. (#265) 2021-08-26 11:31:16 +02:00
Dawid Weiss 39a2fc62d4
LUCENE-10066: Build does not work with JDK16 as gradle's runtime (#259) 2021-08-26 10:08:37 +02:00
Adrien Grand 2d7590a355
LUCENE-9613, LUCENE-10067: Further specialize ordinals. (#260) 2021-08-26 09:44:24 +02:00
David Smiley 8ac2673791
LUCENE-10003: No C style array declaration (#206)
Most cases of C-style array declarations have been switched.  The Google Java Format, that which we adhere to, disallows C-style array declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
Some cases (esp. Snowball) can't be updated.
2021-08-25 17:06:41 -04:00
Michael McCandless 88588e3dea
LUCENE-10052: cutover more tests to newBytesRef, and finally catches a fly (FSTTermsReader.IntersectEnum was illegally ignoring BytesRef.offset, yay!) (#258) 2021-08-25 12:18:23 -04:00
Adrien Grand 8917fbe039 LUCENE-9613, LUCENE-10067: Add more specialization for the ordinals case. 2021-08-25 14:34:04 +02:00
Dawid Weiss 45868a52f1 LUCENE-9990: upgrade to gradle 7.2. 2021-08-25 10:04:42 +02:00
Dawid Weiss 0d07104de0 Piggyback spotless upgrade to 5.14.3 2021-08-25 10:03:59 +02:00
Dawid Weiss a8d4f658de Upgrade to gradle 7.2 2021-08-25 10:03:59 +02:00
Dawid Weiss 0cbafa4879 Fix gradle error hints. 2021-08-25 10:03:59 +02:00