Commit Graph

35394 Commits

Author SHA1 Message Date
Mayya Sharipova cc58c51941
LUCENE-10089 Disable numeric sort optim when needed (#286)
Add a method to SortField that allows to enable/ disable numeric 
sort optimization with points, which is enabled by default from 9.0.
2021-09-09 10:22:42 -04:00
Mike McCandless ee0695eda8 LUCENE-10092: fix test bug by forceMerging the index down to one segment 2021-09-08 14:01:10 -04:00
Adrien Grand 7eb35be045
LUCENE-10087: Validate number of dimensions and bytes per dimension for numeric SortFields. (#283) 2021-09-07 13:28:39 +02:00
Mayya Sharipova bc161e6dcc
LUCENE-10040 Correct TestHnswGraph.testSearchWithAcceptOrds (#277)
If we set numSeed = 10, this test fails sometimes  because it may mark
expected results docs (from 0 to 9) as deleted which don't end up
being retrieved, resulting in a low recall

- set numSeed to 10 to ensure 10 results are returned
- add startIndex paramenter to createRandomAcceptOrds that allows
  documents before startIndex to be NOT deleted
- use startIndex equal to 10 for createRandomAcceptOrds

Relates to #239
2021-09-06 06:56:15 -04:00
Jim Ferenczi 4df8d641ac
LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces (#272)
This change ensures that we don't skip consecutive whitespaces without checking the maximum backtrace gap.
2021-09-06 08:46:39 +02:00
Mike McCandless 34f37d0d43 LUCENE-10035: move CHANGES.txt entry from 9.0 to 8.10 2021-09-03 10:21:28 -04:00
Adrien Grand b3ce44cd0d LUCENE-9620: Implement AssertingWeight#count. 2021-09-03 14:44:07 +02:00
Adrien Grand 4bb018e904 LUCENE-9620: Fix TestTermQuery failure. 2021-09-03 10:48:01 +02:00
Adrien Grand de661d6535 LUCENE-9620: Address profiling test failures. 2021-09-03 10:48:01 +02:00
zacharymorn d4e4fe22b1
Revert "LUCENE-9959: Add non thread local based API for term vector reader usage (#180)" (#280)
This reverts commit 180cfa241b.
2021-09-03 00:31:18 -07:00
Gautam Worah 44e9f5de53
LUCENE-9620 Add Weight#count(LeafReaderContext) (#242)
Add a default implementation in Weight.java and add sample faster
implementations in MatchAllDocsQuery, MatchNoDocsQuery, TermQuery

Add tests for BooleanQuery and TermQuery

Co-authored-by: Gautam Worah <gauworah@amazon.com>
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-09-03 09:09:38 +02:00
Houston Putman 059d06cec7
Fix gpg key download in release wizard. (#279)
Old URL to check the apache id gpg key is no longer available.
2021-09-02 18:08:57 -04:00
Mayya Sharipova 54179e9372
LUCENE-10063 Correct BaseKnnVectorsFormatTestCase.testRandomWithUpdatesAndGraph (#278)
- Make sure that k > 0 for knn search
- Make sure that k doesn't exceed the number of live docs

Relates to #262
2021-09-02 16:23:31 -04:00
Adrien Grand eb2509c846 LUCENE-10035: Fix CHANGES entry. 2021-09-02 18:37:04 +02:00
Robert Muir b0611a14d0
LUCENE-10083: add CHANGES entry for Telugu analyzer 2021-09-02 12:20:34 -04:00
vinodrenu 544dbbea46
LUCENE-10083: Analyzer and stemmer for Telugu language (#275)
* initial version of Telugu analyzer
* made entries for factories and added few more terms in stemmer
* added two more terms
* added few mote terms
* added long to short vowel conversion
* added test cases
* applied code formatting rules
* fixed unclosed p tag in javadoc
* spotlessApply removed the closing p tag
2021-09-02 12:00:13 -04:00
Gautam Worah 1036c708db
LUCENE-9476: Add getBulkPath API to DirectoryTaxonomyReader for faster ordinal -> FacetLabel lookup (#179)
Co-authored-by: Gautam Worah <gauworah@amazon.com>
2021-09-02 07:54:31 -04:00
zacharymorn 34232430f2
LUCENE-9662: fix test failure from merging away soft-deletes (#276) 2021-09-01 22:18:29 -07:00
Michael Sokolov ee7a719dd8 LUCENE-10082: add detail to schema inconsistency error messages 2021-09-01 23:11:35 +00:00
Michael Sokolov e3e54c95c9
LUCENE-10063: test fixes relating to SimpleTextKnnVectorsReader (#273) 2021-09-01 08:19:11 -04:00
zacharymorn 424192e170
LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments (#128) 2021-08-31 19:24:14 -07:00
Michael Sokolov 9c7f0d45ee
LUCENE-10063: implement SimpleTextKnnvectorsReader.search 2021-08-31 13:55:13 -04:00
wuda 6ade29c71a
LUCENE-10035: Simple text codec add multi level skip list data (#224) 2021-08-30 15:27:42 +02:00
Dawid Weiss e470535072
LUCENE-9654: Expressions module gramar antlr code regeneration (#269) 2021-08-27 12:47:19 +02:00
Greg Miller 3b3f9600c2
Fix a DrillSideways unit test I broke when adding more tests in LUCENE-10060 (#268) 2021-08-26 14:44:52 -07:00
Greg Miller dbf7e1865f
LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached (#261) 2021-08-26 06:06:54 -07:00
Adrien Grand f1fdd2465c
LUCENE-9917: Smaller block sizes for BEST_SPEED. (#257)
This reduces the block size for BEST_SPEED in order to trade some compression
ratio in exchange for better retrieval speed.
2021-08-26 15:04:51 +02:00
Dawid Weiss f6e3b08ae9
LUCENE-10072: Regenerate FST dictionaries after LUCENE-9047. (#265) 2021-08-26 11:31:16 +02:00
Dawid Weiss 39a2fc62d4
LUCENE-10066: Build does not work with JDK16 as gradle's runtime (#259) 2021-08-26 10:08:37 +02:00
Adrien Grand 2d7590a355
LUCENE-9613, LUCENE-10067: Further specialize ordinals. (#260) 2021-08-26 09:44:24 +02:00
David Smiley 8ac2673791
LUCENE-10003: No C style array declaration (#206)
Most cases of C-style array declarations have been switched.  The Google Java Format, that which we adhere to, disallows C-style array declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
Some cases (esp. Snowball) can't be updated.
2021-08-25 17:06:41 -04:00
Michael McCandless 88588e3dea
LUCENE-10052: cutover more tests to newBytesRef, and finally catches a fly (FSTTermsReader.IntersectEnum was illegally ignoring BytesRef.offset, yay!) (#258) 2021-08-25 12:18:23 -04:00
Adrien Grand 8917fbe039 LUCENE-9613, LUCENE-10067: Add more specialization for the ordinals case. 2021-08-25 14:34:04 +02:00
Dawid Weiss 45868a52f1 LUCENE-9990: upgrade to gradle 7.2. 2021-08-25 10:04:42 +02:00
Dawid Weiss 0d07104de0 Piggyback spotless upgrade to 5.14.3 2021-08-25 10:03:59 +02:00
Dawid Weiss a8d4f658de Upgrade to gradle 7.2 2021-08-25 10:03:59 +02:00
Dawid Weiss 0cbafa4879 Fix gradle error hints. 2021-08-25 10:03:59 +02:00
Dawid Weiss fdccdee734 Move logging to info leve. 2021-08-25 10:03:59 +02:00
Dawid Weiss 26eb84a3b5 Fix immutable properties. Fix ant uri namespace no longer working (seems like gradle regression). 2021-08-25 10:03:59 +02:00
Dawid Weiss 2b0378cd4a Use JavaInfo instead of toolchains. Internal but works and is free of toolchain's quirks. 2021-08-25 10:03:59 +02:00
Dawid Weiss 68cf86ba35 Experiments with the new apis. 2021-08-25 10:03:59 +02:00
Dawid Weiss 72f373791e Upgrade palantir's plugin. 2021-08-25 10:03:59 +02:00
Dawid Weiss 3ff4263535 Upgrade gradle to 7.1.1 2021-08-25 10:03:59 +02:00
Dawid Weiss 523cea2c5d Revert "Adding initial patch by Gautam Worah" (restore pristine main)
This reverts commit 067ab4f503aabea59639e692e3ea9ee30750c68e.
2021-08-25 10:03:59 +02:00
Dawid Weiss bac22d6116 Adding initial patch by Gautam Worah 2021-08-25 10:03:59 +02:00
Mayya Sharipova fc67d6aa6e Revert "LUCENE-10054 Make HnswGraph hierarchical (#250)"
This reverts commit 257d256def.

We've decided to have a separate feature branch for HNSW,
and put all related changes there.
2021-08-24 14:58:59 -04:00
Julie Tibshirani 782c3cca3a
LUCENE-10040: Relax TestKnnVectorQuery#testDeletes assertion (#251)
TestKnnVectorQuery#testDeletes assumes that if there are n total documents, we
can perform a kNN search with k=n and retrieve all documents. This isn't true
with our implementation -- due to randomization we may select less than n entry
points and never visit some vectors.
2021-08-24 11:15:27 -07:00
Adrien Grand 83ba5d859c
LUCENE-7020: Remove TieredMergePolicy#setMaxMergeAtOnceExplicit. (#230)
TieredMergePolicy no longer bounds the number of segments that can be merged via
a forced merge.
2021-08-24 10:27:00 +02:00
Mayya Sharipova 257d256def
LUCENE-10054 Make HnswGraph hierarchical (#250)
Currently HNSW has only a single layer.
This is the first part to make it multi-layered.

To keep changes small, this PR only adds
 multiple layers in the HnswGraph class.

TODO  for following PRs:
- modify graph construction and search algorithm for a hierarchical
graph.
- modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to
write and read multiple layers\
2021-08-23 15:54:26 -04:00
Greg Miller 46fa09d265
LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts (#255) 2021-08-23 10:01:23 -07:00