Commit Graph

35915 Commits

Author SHA1 Message Date
Mayya Sharipova 87255c117d Add change line for LUCENE-9848 2022-05-04 14:22:31 -04:00
Mayya Sharipova dc6a7f9468
LUCENE-9848 Sort HNSW graph neighbors for construction (#862)
* LUCENE-9848 Sort HNSW graph neighbors for construction

Sort HNSW graph neighbors when applying diversity criterion

During HNSW graph construction, when a node has already a number of
connections larger than maximum allowed (maxConn), we need to prune
its connections using a diversity criteria to limit the number of
connections to maxConn.

Currently when we add reverse connections to already existing nodes,
we don't keep them sorted. Thus later, when we apply diversity criteria
we may prune not the worst most distant non-diverse nodes.

This patch makes sure that neighbours connections are always sorted
from best (closest) to worst (distant), and during the application
of diversity criteria processes nodes from worst to best.

This path does the following:
- enhance NeighborArray to always keep neighbour nodes sorted according
  to their scores (in desc or asc order). Make NeighborArray aware in
  which order the nodes should be sorted.
- make OnHeapHnswGraph aware of the order of similarity function
- make HnswGraphBuilder apply diversity criteria from worst to
  best nodes
- create Lucene90NeighborArray to keep the previous logic of
  NeighborArray for Lucene90Codec
2022-05-04 14:15:14 -04:00
Gautam Worah c3d47507e9
LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (#853) 2022-05-03 12:53:20 +09:00
Lu Xugang fe9d26178d
LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (#859)
* LUCENE-10552: KnnVectorQuery now includes filter in equals/ hashCode
2022-05-02 17:58:47 -04:00
Kevin Risden 7efac761f4
LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (#837) 2022-05-02 13:13:45 -04:00
spike.liu d9d2cb6f09
LUCENE-10188: Give SortedSetDocValues a docValueCount() (#663)
Co-authored-by: vlc刘诚 <chengliu@trip.com>
2022-05-02 10:41:12 -04:00
Tomoko Uchida 5f48469837
Allow to link to github PR from changes (#854) 2022-05-02 23:06:39 +09:00
Michael McCandless 138d40e657
LUCENE-10551: improve testing of LowercaseAsciiCompression (#858) 2022-05-02 08:49:16 -04:00
Kevin Risden 3063109d83
LUCENE-10542: FieldSource exists implementations can avoid value retrieval (#847) 2022-04-29 22:43:16 -04:00
Dawid Weiss 05de9085ce
LUCENE-10539: Return a stream of completions from FSTCompletion. (#844) 2022-04-29 21:35:35 +02:00
Dawid Weiss 75aadb9589
gradle 7.3.3 quick upgrade (#856) 2022-04-29 21:02:19 +02:00
Greg Miller 902a7df0e5
LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (#848) 2022-04-29 08:57:46 -07:00
Ignacio Vera 0dad9ddae8
LUCENE-10508: Use MIN_WIDE_EXTENT for GeoWideDegenerateHorizontalLine (#855) 2022-04-29 10:21:08 +02:00
Dawid Weiss 6e6c61eb13 LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255. 2022-04-29 09:41:42 +02:00
Tomoko Uchida c28f575b6d
LUCENE-10493: move n-best logic to analysis-common (#846) 2022-04-29 10:35:30 +09:00
Chris Hostetter 6afb9bc25a LUCENE-10292: prevent thread leak (or test timeout) if exception/assertion failure in test iterator 2022-04-28 15:17:53 -07:00
Chris Hostetter a8d86ea6e8 LUCENE-10292: Suggest: Fix FreeTextSuggester so that getCount() returned results consistent with lookup() during concurrent build()
Fix SuggestRebuildTestUtil to reliably surfice this kind of failure that was previously sporadic
2022-04-27 18:14:01 -07:00
Gautam Worah 8d9a333fac
LUCENE-10525 Improve WindowsFS emulation to catch invalid file names (#829)
* Add filename checks for WindowsFS
* don't delegate Path default methods, which makes it easier for subclassing. Also fix delegation bug (endsWith was calling startsWith).
2022-04-27 09:52:47 -04:00
Ignacio Vera 922d3af8d6
LUCENE-10508: Use MIN_WIDE_EXTENT for all wide rectangles (#845) 2022-04-27 11:24:16 +02:00
Ignacio Vera 5d3ab09676
LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges (#756)
Check if polygon has been successfully tessellated before we fail (we are failing some valid
  tessellations) and allow filtering edges that fold on top of the previous one
2022-04-27 10:24:22 +02:00
Ignacio Vera 2b20b3f2ca
LUCENE-10508: Fix error for rectangles with an extent close to 180 degrees (#824)
This commit  introduces a GeoWideRectangle.MIN_WIDE_EXTENT that takes into account the angular resolution 
in order to build a GeoWideRectangle.
2022-04-27 07:33:49 +02:00
Greg Miller f11468186a LUCENE-10529: Fix TestTaxonomyFacetAssociations NPE when randomly indexing no documents for dim 2022-04-26 20:13:28 -07:00
Michael Sokolov 2a618586de fix path to jar file in demo documentation 2022-04-26 15:48:21 -04:00
xiaoping ebe2d7b4fd
LUCENE-10499: reduce unnecessary copy data overhead when growing array size (#786)
Co-authored-by: xiaoping.wjp <xiaoping.wjp@alibaba-inc.com>
2022-04-26 15:35:56 +02:00
Dawid Weiss 2966228fae LUCENE-10535: upgrade com.palantir.consistent-versions to 2.10.0 2022-04-26 08:31:15 +02:00
Kevin Risden 223a74fcb5
LUCENE-10533: SpellChecker.formGrams is missing bounds check (#836) 2022-04-25 15:55:50 -04:00
Dawid Weiss a53d05b9f9
Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint. (#834) 2022-04-25 14:51:14 +02:00
Dawid Weiss 2080caff3f
Fix JVM error branch logic. (#835) 2022-04-25 14:33:56 +02:00
Tomoko Uchida c89f8a7ea1
LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori (#805) 2022-04-25 20:09:46 +09:00
Adrien Grand 2a4c21bb58
LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. (#827) 2022-04-25 09:18:21 +02:00
Robert Muir 1089b482fc
LUCENE-10528: use Xvfb in test to avoid messing up user's desktop (#828)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2022-04-23 08:00:33 -04:00
gf2121 35ca2d79f7
LUCENE-10315: Speed up DocIdsWriter by ForUtil (#797) 2022-04-23 19:32:02 +08:00
Chris Hegarty 3bcc40efe9
LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types (#812) 2022-04-21 18:39:53 +02:00
Chris Hegarty 08f848a582
Add two facet tests (#826) 2022-04-21 18:39:41 +02:00
Robert Muir c897aac077
fail clearly on too-new JDK (#819)
Gradle will give a very confusing error, let's make it absolutely clear.

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-04-21 09:22:26 -04:00
Robert Muir d6461eab0b
improve spotless error to suggest running 'gradlew tidy' (#817)
The current error isn't helpful as it suggests a per-module command. If
the user has modified multiple modules, they will be running gradle
commands to try to fix each one of them, when it would be easier to just
run 'gradlew tidy' a single time and fix everything.
2022-04-21 08:30:10 -04:00
Robert Muir 844bd88839
LUCENE-10526: add single method to mockfile to wrap a Path (#822)
Currently "new FilterPath" is called from everywhere, making it impossible for a mockfilesystem to use a custom subclass.
Add FilterFileSystemProvider.wrapPath(path), which subclasses can override. Fix tests to use it instead of juggling URI objects and passing FileSystems around.
2022-04-20 16:40:10 -04:00
Yuting Gan ec53a72a44
LUCENE-10495: Fix return statement of siblingsLoaded() in TaxonomyFacets (#778) 2022-04-20 12:56:43 -07:00
Adrien Grand 2d278a0efe
Clarify that terms dicts are per-field in block-tree's javadocs. (#823) 2022-04-20 17:19:51 +02:00
Robert Muir e390f33258
Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work (#818) 2022-04-20 11:07:24 -04:00
Adrien Grand 7c173b0e1c LUCENE-10153: Make errorprone happy. 2022-04-20 16:47:34 +02:00
Ignacio Vera 4c133f435d
LUCENE-10514: Component2D#Within methods should return NOTWITHIN for triangles within the query geometry (#809)
This commit brings makes sure we always return NOTWITHIN for fully contained triangles in 
Component2D#within* methods
2022-04-20 16:30:29 +02:00
Adrien Grand 15ecf3c27f LUCENE-10503: Fix JIRA number in CHANGES. 2022-04-19 15:40:53 +02:00
Luca Cavanna 866bb86a1c
LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected (#799)
This allows subclasses to extend how the inner collector name is derived.
2022-04-19 15:36:11 +02:00
Adrien Grand d9e37f3123
LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794) 2022-04-19 15:26:24 +02:00
Mike McCandless fb76d0b104 LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place 2022-04-19 07:19:15 -04:00
Mike McCandless c388705855 LUCENE-10482: Ignore this test for now 2022-04-18 17:14:04 -04:00
Tomoko Uchida 872349cef9
Add some basic tasks to help/workflow (#811) 2022-04-18 11:34:28 +09:00
Gautam Worah d322be52f2
LUCENE-10482 Bug Fix: Don't use Instant.now() as prefix for the temp dir name (#814)
* Don't use Instant.now() as prefix for the temp dir name

* spotless
2022-04-17 21:18:08 -04:00
Gautam Worah 10ebc099c8
LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762) 2022-04-15 10:45:02 -07:00