Commit Graph

35820 Commits

Author SHA1 Message Date
Greg Miller 3b3f9600c2
Fix a DrillSideways unit test I broke when adding more tests in LUCENE-10060 (#268) 2021-08-26 14:44:52 -07:00
Greg Miller dbf7e1865f
LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached (#261) 2021-08-26 06:06:54 -07:00
Adrien Grand f1fdd2465c
LUCENE-9917: Smaller block sizes for BEST_SPEED. (#257)
This reduces the block size for BEST_SPEED in order to trade some compression
ratio in exchange for better retrieval speed.
2021-08-26 15:04:51 +02:00
Dawid Weiss f6e3b08ae9
LUCENE-10072: Regenerate FST dictionaries after LUCENE-9047. (#265) 2021-08-26 11:31:16 +02:00
Dawid Weiss 39a2fc62d4
LUCENE-10066: Build does not work with JDK16 as gradle's runtime (#259) 2021-08-26 10:08:37 +02:00
Adrien Grand 2d7590a355
LUCENE-9613, LUCENE-10067: Further specialize ordinals. (#260) 2021-08-26 09:44:24 +02:00
David Smiley 8ac2673791
LUCENE-10003: No C style array declaration (#206)
Most cases of C-style array declarations have been switched.  The Google Java Format, that which we adhere to, disallows C-style array declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
Some cases (esp. Snowball) can't be updated.
2021-08-25 17:06:41 -04:00
Michael McCandless 88588e3dea
LUCENE-10052: cutover more tests to newBytesRef, and finally catches a fly (FSTTermsReader.IntersectEnum was illegally ignoring BytesRef.offset, yay!) (#258) 2021-08-25 12:18:23 -04:00
Adrien Grand 8917fbe039 LUCENE-9613, LUCENE-10067: Add more specialization for the ordinals case. 2021-08-25 14:34:04 +02:00
Dawid Weiss 45868a52f1 LUCENE-9990: upgrade to gradle 7.2. 2021-08-25 10:04:42 +02:00
Dawid Weiss 0d07104de0 Piggyback spotless upgrade to 5.14.3 2021-08-25 10:03:59 +02:00
Dawid Weiss a8d4f658de Upgrade to gradle 7.2 2021-08-25 10:03:59 +02:00
Dawid Weiss 0cbafa4879 Fix gradle error hints. 2021-08-25 10:03:59 +02:00
Dawid Weiss fdccdee734 Move logging to info leve. 2021-08-25 10:03:59 +02:00
Dawid Weiss 26eb84a3b5 Fix immutable properties. Fix ant uri namespace no longer working (seems like gradle regression). 2021-08-25 10:03:59 +02:00
Dawid Weiss 2b0378cd4a Use JavaInfo instead of toolchains. Internal but works and is free of toolchain's quirks. 2021-08-25 10:03:59 +02:00
Dawid Weiss 68cf86ba35 Experiments with the new apis. 2021-08-25 10:03:59 +02:00
Dawid Weiss 72f373791e Upgrade palantir's plugin. 2021-08-25 10:03:59 +02:00
Dawid Weiss 3ff4263535 Upgrade gradle to 7.1.1 2021-08-25 10:03:59 +02:00
Dawid Weiss 523cea2c5d Revert "Adding initial patch by Gautam Worah" (restore pristine main)
This reverts commit 067ab4f503aabea59639e692e3ea9ee30750c68e.
2021-08-25 10:03:59 +02:00
Dawid Weiss bac22d6116 Adding initial patch by Gautam Worah 2021-08-25 10:03:59 +02:00
Mayya Sharipova fc67d6aa6e Revert "LUCENE-10054 Make HnswGraph hierarchical (#250)"
This reverts commit 257d256def.

We've decided to have a separate feature branch for HNSW,
and put all related changes there.
2021-08-24 14:58:59 -04:00
Julie Tibshirani 782c3cca3a
LUCENE-10040: Relax TestKnnVectorQuery#testDeletes assertion (#251)
TestKnnVectorQuery#testDeletes assumes that if there are n total documents, we
can perform a kNN search with k=n and retrieve all documents. This isn't true
with our implementation -- due to randomization we may select less than n entry
points and never visit some vectors.
2021-08-24 11:15:27 -07:00
Adrien Grand 83ba5d859c
LUCENE-7020: Remove TieredMergePolicy#setMaxMergeAtOnceExplicit. (#230)
TieredMergePolicy no longer bounds the number of segments that can be merged via
a forced merge.
2021-08-24 10:27:00 +02:00
Mayya Sharipova 257d256def
LUCENE-10054 Make HnswGraph hierarchical (#250)
Currently HNSW has only a single layer.
This is the first part to make it multi-layered.

To keep changes small, this PR only adds
 multiple layers in the HnswGraph class.

TODO  for following PRs:
- modify graph construction and search algorithm for a hierarchical
graph.
- modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to
write and read multiple layers\
2021-08-23 15:54:26 -04:00
Greg Miller 46fa09d265
LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts (#255) 2021-08-23 10:01:23 -07:00
51search 191ee3ad3e
LUCENE-10058: fix gradle lucene:benchmark:run error (#253) 2021-08-23 10:36:33 -04:00
Uwe Schindler 5813292de2 LUCENE-10055: Update Subversion foder for Javadocs 2021-08-22 13:34:09 +02:00
Michael Sokolov 054b444c14 Fix off-by-one in TestDemo.testKnnVectorSearch 2021-08-21 14:22:47 -04:00
Mike Drob c36495dce7
LUCENE-10017 Less verbose exception on IndexFormatTooOld (#200) 2021-08-20 15:40:52 -05:00
Dzung Bui 0c3c8ec09a
LUCENE-10059: Fix an AssertionError when JapaneseTokenizer tries to backtrace from and to the same position (#254)
Co-authored-by: Anh Dung Bui <buidun@amazon.com>
2021-08-20 08:21:58 -04:00
Michael Sokolov 5896e5389a
LUCENE-10057: Use Lucene abstractions to store demo KnnVectorDict (Dawid Weiss) 2021-08-19 16:14:06 -04:00
Michael Sokolov eeb296ce90
LUCENE-8638: remove LegacyBM25Similarity 2021-08-18 15:44:56 -04:00
Michael Sokolov b8210dee7a Close vector dictionary when exiting the demo 2021-08-18 15:43:33 -04:00
Michael Sokolov d1d60e2db6
LUCENE-8638: remove unused deprecated methods and related tests (#248) 2021-08-18 08:19:49 -04:00
Michael Sokolov 666c7a2590
LUCENE-8638: remove deprecated FST get by output 2021-08-18 08:15:31 -04:00
Michael Sokolov a37844aedd
LUCENE-10016: Added KnnVector index/query support to demo 2021-08-18 08:13:59 -04:00
Michael Sokolov 4213f9d3cd
LUCENE-8638: remove long-deprecated Jaspell suggester 2021-08-17 17:45:22 -04:00
Michael McCandless 65a53450dc
LUCENE-10052: first cut at LTC.newBytesRef methods, and switching a few test cases over (#245)
* LUCENE-10052: first cut at LTC.newBytesRef methods, to randomize the offset/length of a BytesRef, and switching a few test cases over
2021-08-17 16:18:40 -04:00
Michael Sokolov 2d21a600ba
LUCENE-8638: remove deprecated code (#243) 2021-08-17 13:51:04 -04:00
Julie Tibshirani 29ed3908ea LUCENE-9614: Small fixes to KnnVectorQuery hashCode and toString 2021-08-16 09:10:53 -07:00
Julie Tibshirani e48be684b2
LUCENE-9614: Prevent TestKnnVectorQuery from using simple text codec (#244)
The simple text codec doesn't support kNN searches, so the test will fail when
we randomly chose to use it.
2021-08-16 09:11:03 -07:00
Julie Tibshirani 6993fb9a99
LUCENE-10040: Handle deletions in nearest vector search (#239)
This PR extends VectorReader#search to take a parameter specifying the live
docs. LeafReader#searchNearestVectors then always returns the k nearest
undeleted docs.

To implement this, the HNSW algorithm will only add a candidate to the result
set if it is a live doc. The graph search still visits and traverses deleted
docs as it gathers candidates.
2021-08-16 07:44:17 -07:00
Mike McCandless 19e5c00a4f LUCENE-10014: fix performance bug: when writing doc values with block GCD compression we were unnecessarily wasting index storage by failing to take fully advantage of the GCD compression 2021-08-16 08:40:02 -04:00
Mike McCandless b18f714096 LUCENE-10008: add CHANGES entry 2021-08-13 14:47:53 -04:00
Vigya Sharma cb4c8ae07f
Lucene-10008: Respect ignoreCase flag in CommonGramsFilterFactory and factor out a common abstract base class AbstractWordsFileFilterFactory.java (#188) 2021-08-13 14:45:58 -04:00
Michael Sokolov 624560a3d7
LUCENE-9614: add KnnVectorQuery implementation 2021-08-13 12:15:40 -04:00
Julie Tibshirani a9fb5a965d
LUCENE-10043: Decrease default LRUQueryCache#skipCacheFactor to 10 (#232)
In LUCENE-9002 we introduced logic to skip caching a clause if it would be too
expensive compared to the usual query cost. Specifically, we avoid caching a
clause if its cost is estimated to be a 250x higher than the lead iterator's.
We've found that the default of 250 is quite high and can lead to poor tail
latencies. This PR decreases it to 10 to cache more conservatively.
2021-08-11 13:29:12 +03:00
Mike McCandless 931ff63232 LUCENE-9963: add CHANGES entry 2021-08-09 16:11:31 -04:00
Geoffrey Lawson 647255b4d2
LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming token graphs with holes (#157)
6 main improvements:
    1) Iterate through all output.InputNodes since dest gaps can exist.
    2) freeBefore the minimum input node instead of the first input node(which was usually, but not always, the minimum).
    3) Don't freeBefore from a hole source node. Book keeping may not be correct and could result in an early free.
    4) When adding an output node after hole recovery, calculate its new position increment instead of adding it to the end of the output graph.
    5) Nodes after holes that have edges to their source will do the output re-mapping that the deleted node would have done.
    6) If a disconnected input node swaps order with another node in the output, then map them to the same output node.

Co-authored-by: Lawson <geoffrl@amazon.com>
2021-08-09 16:06:53 -04:00