Commit Graph

36141 Commits

Author SHA1 Message Date
Greg Miller e01b65d284 CHANGES entry for LUCENE-10488 2022-05-13 16:02:57 -07:00
Yuting Gan f0ec226167
LUCENE-10488: Optimize Facets#getTopDims in FloatTaxonomyFacets (#806) 2022-05-13 15:54:41 -07:00
Yuting Gan 57f8cb2fd6
LUCENE-10488: Optimize Facets#getTopDims in IntTaxonomyFacets (#779) 2022-05-13 15:54:31 -07:00
Yuting Gan ef43242d77
LUCENE-10488: Optimized getTopDims in ConcurrentSSDVFacetCounts (#777) 2022-05-13 15:54:18 -07:00
Julie Tibshirani 2cca0e8441 LUCENE-10564: Fix errorprone warning
This slipped through in the original commit because we only enable errorprone on
nightly runs.
2022-05-12 17:31:55 -07:00
Julie Tibshirani 802f5422c0 Add CHANGES entry for LUCENE-10564 2022-05-12 13:37:47 -07:00
Julie Tibshirani 3afc9fa966
LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage (#882)
Before, it didn't update the estimated memory usage, so calls to ramBytesUsed
could be totally off.
2022-05-12 13:29:07 -07:00
Mayya Sharipova ea5c40686f
LUCENE-10527 Use 2*maxConn for last layer in HNSW (#872)
The original HNSW paper (https://arxiv.org/pdf/1603.09320.pdf) suggests
to use a different maxConn for the upper layers vs. the bottom one
(which contains the full neighborhood graph). Specifically, they
suggest using maxConn=M for upper layers and maxConn=2*M for the bottom.

This patch ensures that we follow this recommendation and use
maxConn=2*M for the bottom layer.
2022-05-12 15:22:25 -04:00
Adrien Grand 8f89db8048
LUCENE-10536: Slightly better compression of doc values' terms dictionaries. (#838)
Doc values terms dictionaries keep the first term of each block uncompressed so
that they can somewhat efficiently perform binary searches across blocks.
Suffixes of the other 63 terms are compressed together using LZ4 to leverage
redundancy across suffixes. This change improves compression a bit by using the
first (uncompressed) term of each block as a dictionary when compressing
suffixes of the 63 other terms. This helps with compressing the first few
suffixes when there's not much context yet that can be leveraged to find
duplicates.
2022-05-12 10:32:58 +02:00
zacharymorn 96036bca9f
LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833) 2022-05-11 22:26:35 -07:00
Lu Xugang a06460a538
LUCENE-10502: add changes entry (#881) 2022-05-11 21:24:22 -04:00
Lu Xugang 6040d1648f
LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc (#880)
Currently vector's all docs of all fields are fully loaded into memory (for sparse cases).
This happens not only when we do vector search, but also when we open an index to 
load meta info for vector readers.

This patch instead uses IndexedDISI to store docIds and DirectMonotonicWriter/Reader to 
handle  ordToDoc mapping. Benefits are reduced memory usage, and faster loading of 
meta info for vector readers.
2022-05-11 13:18:10 -04:00
Adrien Grand 54595611ae LUCENE-10496: CHANGES entry. 2022-05-11 11:38:44 +02:00
xiaoping e49708e01d
LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction (#780) 2022-05-11 11:32:07 +02:00
xiaoping 6c6bb00cec
LUCENE-10555: fix iteratorCost initial logic error (#878) 2022-05-11 08:36:24 +02:00
Adrien Grand 8476ac1f6a Fix rare test failures in TestSortOptimization.
The skipping logic relies on the points index telling us by how much we can
reduce the candidate set by applying a filter that only matches documents that
compare better than the bottom value.

Some randomized points formats have large numbers of points per leaf, and
produce estimates of point counts for range queries that are way above the
actual value, which in-turn doesn't enable skipping when we think it should. To
avoid running into this corner case, this change forces the default codec on
this test.
2022-05-10 17:16:42 +02:00
xiaoping 102483bc57
fix bkd test logic error and doc error (#863) 2022-05-10 13:10:00 +02:00
xiaoping f431511cb7 LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization when NumericLeafComparator#setScorer is called (#864) 2022-05-10 13:07:41 +02:00
Robert Muir 3edfeb5eb2
LUCENE-10532: remove @Slow annotation (#832)
Remove `@Slow` annotation, for more consistency with CI and local jobs. All tests can be fast!
2022-05-09 23:03:55 -04:00
Ramin ALirezaee 111d6b186e
LUCENE-10312: Add PersianStemmer (#540)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2022-05-07 17:09:56 +09:00
Uwe Schindler 8aa4a56491
LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (#871) 2022-05-06 16:49:56 +02:00
Alan Woodward 5f832c64bf
LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (#869)
The method moved from DocValuesFieldExistsQuery to DocValuesIterator, but the latter
is a package-private utility class, making it invisible to client code.  This commit moves it
back onto FieldExistsQuery, meaning that the upgrade path will be the same as for all other
uses of DocValuesFieldExistsQuery.
2022-05-06 09:41:28 +01:00
Uwe Schindler 14dcc9c9ce Disable liftbot, we have our own tools 2022-05-05 22:27:57 +02:00
Adrien Grand 26301898b2
LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860)
The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.
2022-05-05 10:24:28 +01:00
Tomoko Uchida a89c57f35f
Make CONTRIBUTING.md a bit more succinct (#866) 2022-05-05 10:35:33 +09:00
Michael Sokolov 7fbaa63dd1
LUCENE-10504: KnnGraphTester to use KnnVectorQuery (#796)
* LUCENE-10504: KnnGraphTester to use KnnVectorQuery
2022-05-04 18:22:48 -04:00
Mayya Sharipova 87255c117d Add change line for LUCENE-9848 2022-05-04 14:22:31 -04:00
Mayya Sharipova dc6a7f9468
LUCENE-9848 Sort HNSW graph neighbors for construction (#862)
* LUCENE-9848 Sort HNSW graph neighbors for construction

Sort HNSW graph neighbors when applying diversity criterion

During HNSW graph construction, when a node has already a number of
connections larger than maximum allowed (maxConn), we need to prune
its connections using a diversity criteria to limit the number of
connections to maxConn.

Currently when we add reverse connections to already existing nodes,
we don't keep them sorted. Thus later, when we apply diversity criteria
we may prune not the worst most distant non-diverse nodes.

This patch makes sure that neighbours connections are always sorted
from best (closest) to worst (distant), and during the application
of diversity criteria processes nodes from worst to best.

This path does the following:
- enhance NeighborArray to always keep neighbour nodes sorted according
  to their scores (in desc or asc order). Make NeighborArray aware in
  which order the nodes should be sorted.
- make OnHeapHnswGraph aware of the order of similarity function
- make HnswGraphBuilder apply diversity criteria from worst to
  best nodes
- create Lucene90NeighborArray to keep the previous logic of
  NeighborArray for Lucene90Codec
2022-05-04 14:15:14 -04:00
Gautam Worah c3d47507e9
LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (#853) 2022-05-03 12:53:20 +09:00
Lu Xugang fe9d26178d
LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (#859)
* LUCENE-10552: KnnVectorQuery now includes filter in equals/ hashCode
2022-05-02 17:58:47 -04:00
Kevin Risden 7efac761f4
LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (#837) 2022-05-02 13:13:45 -04:00
spike.liu d9d2cb6f09
LUCENE-10188: Give SortedSetDocValues a docValueCount() (#663)
Co-authored-by: vlc刘诚 <chengliu@trip.com>
2022-05-02 10:41:12 -04:00
Tomoko Uchida 5f48469837
Allow to link to github PR from changes (#854) 2022-05-02 23:06:39 +09:00
Michael McCandless 138d40e657
LUCENE-10551: improve testing of LowercaseAsciiCompression (#858) 2022-05-02 08:49:16 -04:00
Kevin Risden 3063109d83
LUCENE-10542: FieldSource exists implementations can avoid value retrieval (#847) 2022-04-29 22:43:16 -04:00
Dawid Weiss 05de9085ce
LUCENE-10539: Return a stream of completions from FSTCompletion. (#844) 2022-04-29 21:35:35 +02:00
Dawid Weiss 75aadb9589
gradle 7.3.3 quick upgrade (#856) 2022-04-29 21:02:19 +02:00
Greg Miller 902a7df0e5
LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (#848) 2022-04-29 08:57:46 -07:00
Ignacio Vera 0dad9ddae8
LUCENE-10508: Use MIN_WIDE_EXTENT for GeoWideDegenerateHorizontalLine (#855) 2022-04-29 10:21:08 +02:00
Dawid Weiss 6e6c61eb13 LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255. 2022-04-29 09:41:42 +02:00
Tomoko Uchida c28f575b6d
LUCENE-10493: move n-best logic to analysis-common (#846) 2022-04-29 10:35:30 +09:00
Chris Hostetter 6afb9bc25a LUCENE-10292: prevent thread leak (or test timeout) if exception/assertion failure in test iterator 2022-04-28 15:17:53 -07:00
Chris Hostetter a8d86ea6e8 LUCENE-10292: Suggest: Fix FreeTextSuggester so that getCount() returned results consistent with lookup() during concurrent build()
Fix SuggestRebuildTestUtil to reliably surfice this kind of failure that was previously sporadic
2022-04-27 18:14:01 -07:00
Gautam Worah 8d9a333fac
LUCENE-10525 Improve WindowsFS emulation to catch invalid file names (#829)
* Add filename checks for WindowsFS
* don't delegate Path default methods, which makes it easier for subclassing. Also fix delegation bug (endsWith was calling startsWith).
2022-04-27 09:52:47 -04:00
Ignacio Vera 922d3af8d6
LUCENE-10508: Use MIN_WIDE_EXTENT for all wide rectangles (#845) 2022-04-27 11:24:16 +02:00
Ignacio Vera 5d3ab09676
LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges (#756)
Check if polygon has been successfully tessellated before we fail (we are failing some valid
  tessellations) and allow filtering edges that fold on top of the previous one
2022-04-27 10:24:22 +02:00
Ignacio Vera 2b20b3f2ca
LUCENE-10508: Fix error for rectangles with an extent close to 180 degrees (#824)
This commit  introduces a GeoWideRectangle.MIN_WIDE_EXTENT that takes into account the angular resolution 
in order to build a GeoWideRectangle.
2022-04-27 07:33:49 +02:00
Greg Miller f11468186a LUCENE-10529: Fix TestTaxonomyFacetAssociations NPE when randomly indexing no documents for dim 2022-04-26 20:13:28 -07:00
Michael Sokolov 2a618586de fix path to jar file in demo documentation 2022-04-26 15:48:21 -04:00
xiaoping ebe2d7b4fd
LUCENE-10499: reduce unnecessary copy data overhead when growing array size (#786)
Co-authored-by: xiaoping.wjp <xiaoping.wjp@alibaba-inc.com>
2022-04-26 15:35:56 +02:00