Commit Graph

35282 Commits

Author SHA1 Message Date
Mayya Sharipova 54179e9372
LUCENE-10063 Correct BaseKnnVectorsFormatTestCase.testRandomWithUpdatesAndGraph (#278)
- Make sure that k > 0 for knn search
- Make sure that k doesn't exceed the number of live docs

Relates to #262
2021-09-02 16:23:31 -04:00
Adrien Grand eb2509c846 LUCENE-10035: Fix CHANGES entry. 2021-09-02 18:37:04 +02:00
Robert Muir b0611a14d0
LUCENE-10083: add CHANGES entry for Telugu analyzer 2021-09-02 12:20:34 -04:00
vinodrenu 544dbbea46
LUCENE-10083: Analyzer and stemmer for Telugu language (#275)
* initial version of Telugu analyzer
* made entries for factories and added few more terms in stemmer
* added two more terms
* added few mote terms
* added long to short vowel conversion
* added test cases
* applied code formatting rules
* fixed unclosed p tag in javadoc
* spotlessApply removed the closing p tag
2021-09-02 12:00:13 -04:00
Gautam Worah 1036c708db
LUCENE-9476: Add getBulkPath API to DirectoryTaxonomyReader for faster ordinal -> FacetLabel lookup (#179)
Co-authored-by: Gautam Worah <gauworah@amazon.com>
2021-09-02 07:54:31 -04:00
zacharymorn 34232430f2
LUCENE-9662: fix test failure from merging away soft-deletes (#276) 2021-09-01 22:18:29 -07:00
Michael Sokolov ee7a719dd8 LUCENE-10082: add detail to schema inconsistency error messages 2021-09-01 23:11:35 +00:00
Michael Sokolov e3e54c95c9
LUCENE-10063: test fixes relating to SimpleTextKnnVectorsReader (#273) 2021-09-01 08:19:11 -04:00
zacharymorn 424192e170
LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments (#128) 2021-08-31 19:24:14 -07:00
Michael Sokolov 9c7f0d45ee
LUCENE-10063: implement SimpleTextKnnvectorsReader.search 2021-08-31 13:55:13 -04:00
wuda 6ade29c71a
LUCENE-10035: Simple text codec add multi level skip list data (#224) 2021-08-30 15:27:42 +02:00
Dawid Weiss e470535072
LUCENE-9654: Expressions module gramar antlr code regeneration (#269) 2021-08-27 12:47:19 +02:00
Greg Miller 3b3f9600c2
Fix a DrillSideways unit test I broke when adding more tests in LUCENE-10060 (#268) 2021-08-26 14:44:52 -07:00
Greg Miller dbf7e1865f
LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached (#261) 2021-08-26 06:06:54 -07:00
Adrien Grand f1fdd2465c
LUCENE-9917: Smaller block sizes for BEST_SPEED. (#257)
This reduces the block size for BEST_SPEED in order to trade some compression
ratio in exchange for better retrieval speed.
2021-08-26 15:04:51 +02:00
Dawid Weiss f6e3b08ae9
LUCENE-10072: Regenerate FST dictionaries after LUCENE-9047. (#265) 2021-08-26 11:31:16 +02:00
Dawid Weiss 39a2fc62d4
LUCENE-10066: Build does not work with JDK16 as gradle's runtime (#259) 2021-08-26 10:08:37 +02:00
Adrien Grand 2d7590a355
LUCENE-9613, LUCENE-10067: Further specialize ordinals. (#260) 2021-08-26 09:44:24 +02:00
David Smiley 8ac2673791
LUCENE-10003: No C style array declaration (#206)
Most cases of C-style array declarations have been switched.  The Google Java Format, that which we adhere to, disallows C-style array declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
Some cases (esp. Snowball) can't be updated.
2021-08-25 17:06:41 -04:00
Michael McCandless 88588e3dea
LUCENE-10052: cutover more tests to newBytesRef, and finally catches a fly (FSTTermsReader.IntersectEnum was illegally ignoring BytesRef.offset, yay!) (#258) 2021-08-25 12:18:23 -04:00
Adrien Grand 8917fbe039 LUCENE-9613, LUCENE-10067: Add more specialization for the ordinals case. 2021-08-25 14:34:04 +02:00
Dawid Weiss 45868a52f1 LUCENE-9990: upgrade to gradle 7.2. 2021-08-25 10:04:42 +02:00
Dawid Weiss 0d07104de0 Piggyback spotless upgrade to 5.14.3 2021-08-25 10:03:59 +02:00
Dawid Weiss a8d4f658de Upgrade to gradle 7.2 2021-08-25 10:03:59 +02:00
Dawid Weiss 0cbafa4879 Fix gradle error hints. 2021-08-25 10:03:59 +02:00
Dawid Weiss fdccdee734 Move logging to info leve. 2021-08-25 10:03:59 +02:00
Dawid Weiss 26eb84a3b5 Fix immutable properties. Fix ant uri namespace no longer working (seems like gradle regression). 2021-08-25 10:03:59 +02:00
Dawid Weiss 2b0378cd4a Use JavaInfo instead of toolchains. Internal but works and is free of toolchain's quirks. 2021-08-25 10:03:59 +02:00
Dawid Weiss 68cf86ba35 Experiments with the new apis. 2021-08-25 10:03:59 +02:00
Dawid Weiss 72f373791e Upgrade palantir's plugin. 2021-08-25 10:03:59 +02:00
Dawid Weiss 3ff4263535 Upgrade gradle to 7.1.1 2021-08-25 10:03:59 +02:00
Dawid Weiss 523cea2c5d Revert "Adding initial patch by Gautam Worah" (restore pristine main)
This reverts commit 067ab4f503aabea59639e692e3ea9ee30750c68e.
2021-08-25 10:03:59 +02:00
Dawid Weiss bac22d6116 Adding initial patch by Gautam Worah 2021-08-25 10:03:59 +02:00
Mayya Sharipova fc67d6aa6e Revert "LUCENE-10054 Make HnswGraph hierarchical (#250)"
This reverts commit 257d256def.

We've decided to have a separate feature branch for HNSW,
and put all related changes there.
2021-08-24 14:58:59 -04:00
Julie Tibshirani 782c3cca3a
LUCENE-10040: Relax TestKnnVectorQuery#testDeletes assertion (#251)
TestKnnVectorQuery#testDeletes assumes that if there are n total documents, we
can perform a kNN search with k=n and retrieve all documents. This isn't true
with our implementation -- due to randomization we may select less than n entry
points and never visit some vectors.
2021-08-24 11:15:27 -07:00
Adrien Grand 83ba5d859c
LUCENE-7020: Remove TieredMergePolicy#setMaxMergeAtOnceExplicit. (#230)
TieredMergePolicy no longer bounds the number of segments that can be merged via
a forced merge.
2021-08-24 10:27:00 +02:00
Mayya Sharipova 257d256def
LUCENE-10054 Make HnswGraph hierarchical (#250)
Currently HNSW has only a single layer.
This is the first part to make it multi-layered.

To keep changes small, this PR only adds
 multiple layers in the HnswGraph class.

TODO  for following PRs:
- modify graph construction and search algorithm for a hierarchical
graph.
- modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to
write and read multiple layers\
2021-08-23 15:54:26 -04:00
Greg Miller 46fa09d265
LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts (#255) 2021-08-23 10:01:23 -07:00
51search 191ee3ad3e
LUCENE-10058: fix gradle lucene:benchmark:run error (#253) 2021-08-23 10:36:33 -04:00
Uwe Schindler 5813292de2 LUCENE-10055: Update Subversion foder for Javadocs 2021-08-22 13:34:09 +02:00
Michael Sokolov 054b444c14 Fix off-by-one in TestDemo.testKnnVectorSearch 2021-08-21 14:22:47 -04:00
Mike Drob c36495dce7
LUCENE-10017 Less verbose exception on IndexFormatTooOld (#200) 2021-08-20 15:40:52 -05:00
Dzung Bui 0c3c8ec09a
LUCENE-10059: Fix an AssertionError when JapaneseTokenizer tries to backtrace from and to the same position (#254)
Co-authored-by: Anh Dung Bui <buidun@amazon.com>
2021-08-20 08:21:58 -04:00
Michael Sokolov 5896e5389a
LUCENE-10057: Use Lucene abstractions to store demo KnnVectorDict (Dawid Weiss) 2021-08-19 16:14:06 -04:00
Michael Sokolov eeb296ce90
LUCENE-8638: remove LegacyBM25Similarity 2021-08-18 15:44:56 -04:00
Michael Sokolov b8210dee7a Close vector dictionary when exiting the demo 2021-08-18 15:43:33 -04:00
Michael Sokolov d1d60e2db6
LUCENE-8638: remove unused deprecated methods and related tests (#248) 2021-08-18 08:19:49 -04:00
Michael Sokolov 666c7a2590
LUCENE-8638: remove deprecated FST get by output 2021-08-18 08:15:31 -04:00
Michael Sokolov a37844aedd
LUCENE-10016: Added KnnVector index/query support to demo 2021-08-18 08:13:59 -04:00
Michael Sokolov 4213f9d3cd
LUCENE-8638: remove long-deprecated Jaspell suggester 2021-08-17 17:45:22 -04:00