lucene

Commit Graph

Author	SHA1	Message	Date
sunwq	de8a6998d7	LUCENE-10568: fix javadocs errors in IndexWriter.DocStats (#884 )	2022-05-16 13:34:29 +09:00
Tomoko Uchida	c577508630	correct pr number in changes	2022-05-16 10:38:31 +09:00
Uwe Schindler	24ae064234	Correct issue numbers	2022-05-15 17:48:09 +02:00
Uwe Schindler	fcc6de1a1f	Add Github PR/Issue numbers to CHANGES.txt	2022-05-15 11:19:32 +02:00
Uwe Schindler	7a8071c9d4	Detect CI builds and enable errorprone by default for those CI builds (#890 )	2022-05-14 20:49:50 +02:00
Rushabh Shah	694d797526	LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes (#883 ) Co-authored-by: Rushabh Shah <shahrs87@apache.org> Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>	2022-05-14 12:01:19 +09:00
Greg Miller	e01b65d284	CHANGES entry for LUCENE-10488	2022-05-13 16:02:57 -07:00
Yuting Gan	f0ec226167	LUCENE-10488: Optimize Facets#getTopDims in FloatTaxonomyFacets (#806 )	2022-05-13 15:54:41 -07:00
Yuting Gan	57f8cb2fd6	LUCENE-10488: Optimize Facets#getTopDims in IntTaxonomyFacets (#779 )	2022-05-13 15:54:31 -07:00
Yuting Gan	ef43242d77	LUCENE-10488: Optimized getTopDims in ConcurrentSSDVFacetCounts (#777 )	2022-05-13 15:54:18 -07:00
Julie Tibshirani	2cca0e8441	LUCENE-10564: Fix errorprone warning This slipped through in the original commit because we only enable errorprone on nightly runs.	2022-05-12 17:31:55 -07:00
Julie Tibshirani	802f5422c0	Add CHANGES entry for LUCENE-10564	2022-05-12 13:37:47 -07:00
Julie Tibshirani	3afc9fa966	LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage (#882 ) Before, it didn't update the estimated memory usage, so calls to ramBytesUsed could be totally off.	2022-05-12 13:29:07 -07:00
Mayya Sharipova	ea5c40686f	LUCENE-10527 Use 2maxConn for last layer in HNSW (#872 ) The original HNSW paper (https://arxiv.org/pdf/1603.09320.pdf) suggests to use a different maxConn for the upper layers vs. the bottom one (which contains the full neighborhood graph). Specifically, they suggest using maxConn=M for upper layers and maxConn=2M for the bottom. This patch ensures that we follow this recommendation and use maxConn=2*M for the bottom layer.	2022-05-12 15:22:25 -04:00
Adrien Grand	8f89db8048	LUCENE-10536: Slightly better compression of doc values' terms dictionaries. (#838 ) Doc values terms dictionaries keep the first term of each block uncompressed so that they can somewhat efficiently perform binary searches across blocks. Suffixes of the other 63 terms are compressed together using LZ4 to leverage redundancy across suffixes. This change improves compression a bit by using the first (uncompressed) term of each block as a dictionary when compressing suffixes of the 63 other terms. This helps with compressing the first few suffixes when there's not much context yet that can be leveraged to find duplicates.	2022-05-12 10:32:58 +02:00
zacharymorn	96036bca9f	LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833 )	2022-05-11 22:26:35 -07:00
Lu Xugang	a06460a538	LUCENE-10502: add changes entry (#881 )	2022-05-11 21:24:22 -04:00
Lu Xugang	6040d1648f	LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc (#880 ) Currently vector's all docs of all fields are fully loaded into memory (for sparse cases). This happens not only when we do vector search, but also when we open an index to load meta info for vector readers. This patch instead uses IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc mapping. Benefits are reduced memory usage, and faster loading of meta info for vector readers.	2022-05-11 13:18:10 -04:00
Adrien Grand	54595611ae	LUCENE-10496: CHANGES entry.	2022-05-11 11:38:44 +02:00
xiaoping	e49708e01d	LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction (#780 )	2022-05-11 11:32:07 +02:00
xiaoping	6c6bb00cec	LUCENE-10555: fix iteratorCost initial logic error (#878 )	2022-05-11 08:36:24 +02:00
Adrien Grand	8476ac1f6a	Fix rare test failures in TestSortOptimization. The skipping logic relies on the points index telling us by how much we can reduce the candidate set by applying a filter that only matches documents that compare better than the bottom value. Some randomized points formats have large numbers of points per leaf, and produce estimates of point counts for range queries that are way above the actual value, which in-turn doesn't enable skipping when we think it should. To avoid running into this corner case, this change forces the default codec on this test.	2022-05-10 17:16:42 +02:00
xiaoping	102483bc57	fix bkd test logic error and doc error (#863 )	2022-05-10 13:10:00 +02:00
xiaoping	f431511cb7	LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization when NumericLeafComparator#setScorer is called (#864 )	2022-05-10 13:07:41 +02:00
Robert Muir	3edfeb5eb2	LUCENE-10532: remove @Slow annotation (#832 ) Remove `@Slow` annotation, for more consistency with CI and local jobs. All tests can be fast!	2022-05-09 23:03:55 -04:00
Ramin ALirezaee	111d6b186e	LUCENE-10312: Add PersianStemmer (#540 ) Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>	2022-05-07 17:09:56 +09:00
Uwe Schindler	8aa4a56491	LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (#871 )	2022-05-06 16:49:56 +02:00
Alan Woodward	5f832c64bf	LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (#869 ) The method moved from DocValuesFieldExistsQuery to DocValuesIterator, but the latter is a package-private utility class, making it invisible to client code. This commit moves it back onto FieldExistsQuery, meaning that the upgrade path will be the same as for all other uses of DocValuesFieldExistsQuery.	2022-05-06 09:41:28 +01:00
Uwe Schindler	14dcc9c9ce	Disable liftbot, we have our own tools	2022-05-05 22:27:57 +02:00
Adrien Grand	26301898b2	LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860 ) The computation of the scaling factor has special cases for these two values, but the current logic is backwards.	2022-05-05 10:24:28 +01:00
Tomoko Uchida	a89c57f35f	Make CONTRIBUTING.md a bit more succinct (#866 )	2022-05-05 10:35:33 +09:00
Michael Sokolov	7fbaa63dd1	LUCENE-10504: KnnGraphTester to use KnnVectorQuery (#796 ) * LUCENE-10504: KnnGraphTester to use KnnVectorQuery	2022-05-04 18:22:48 -04:00
Mayya Sharipova	87255c117d	Add change line for LUCENE-9848	2022-05-04 14:22:31 -04:00
Mayya Sharipova	dc6a7f9468	LUCENE-9848 Sort HNSW graph neighbors for construction (#862 ) * LUCENE-9848 Sort HNSW graph neighbors for construction Sort HNSW graph neighbors when applying diversity criterion During HNSW graph construction, when a node has already a number of connections larger than maximum allowed (maxConn), we need to prune its connections using a diversity criteria to limit the number of connections to maxConn. Currently when we add reverse connections to already existing nodes, we don't keep them sorted. Thus later, when we apply diversity criteria we may prune not the worst most distant non-diverse nodes. This patch makes sure that neighbours connections are always sorted from best (closest) to worst (distant), and during the application of diversity criteria processes nodes from worst to best. This path does the following: - enhance NeighborArray to always keep neighbour nodes sorted according to their scores (in desc or asc order). Make NeighborArray aware in which order the nodes should be sorted. - make OnHeapHnswGraph aware of the order of similarity function - make HnswGraphBuilder apply diversity criteria from worst to best nodes - create Lucene90NeighborArray to keep the previous logic of NeighborArray for Lucene90Codec	2022-05-04 14:15:14 -04:00
Gautam Worah	c3d47507e9	LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (#853 )	2022-05-03 12:53:20 +09:00
Lu Xugang	fe9d26178d	LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (#859 ) * LUCENE-10552: KnnVectorQuery now includes filter in equals/ hashCode	2022-05-02 17:58:47 -04:00
Kevin Risden	7efac761f4	LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (#837 )	2022-05-02 13:13:45 -04:00
spike.liu	d9d2cb6f09	LUCENE-10188: Give SortedSetDocValues a docValueCount() (#663 ) Co-authored-by: vlc刘诚 <chengliu@trip.com>	2022-05-02 10:41:12 -04:00
Tomoko Uchida	5f48469837	Allow to link to github PR from changes (#854 )	2022-05-02 23:06:39 +09:00
Michael McCandless	138d40e657	LUCENE-10551: improve testing of LowercaseAsciiCompression (#858 )	2022-05-02 08:49:16 -04:00
Kevin Risden	3063109d83	LUCENE-10542: FieldSource exists implementations can avoid value retrieval (#847 )	2022-04-29 22:43:16 -04:00
Dawid Weiss	05de9085ce	LUCENE-10539: Return a stream of completions from FSTCompletion. (#844 )	2022-04-29 21:35:35 +02:00
Dawid Weiss	75aadb9589	gradle 7.3.3 quick upgrade (#856 )	2022-04-29 21:02:19 +02:00
Greg Miller	902a7df0e5	LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (#848 )	2022-04-29 08:57:46 -07:00
Ignacio Vera	0dad9ddae8	LUCENE-10508: Use MIN_WIDE_EXTENT for GeoWideDegenerateHorizontalLine (#855 )	2022-04-29 10:21:08 +02:00
Dawid Weiss	6e6c61eb13	LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255.	2022-04-29 09:41:42 +02:00
Tomoko Uchida	c28f575b6d	LUCENE-10493: move n-best logic to analysis-common (#846 )	2022-04-29 10:35:30 +09:00
Chris Hostetter	6afb9bc25a	LUCENE-10292: prevent thread leak (or test timeout) if exception/assertion failure in test iterator	2022-04-28 15:17:53 -07:00
Chris Hostetter	a8d86ea6e8	LUCENE-10292: Suggest: Fix FreeTextSuggester so that getCount() returned results consistent with lookup() during concurrent build() Fix SuggestRebuildTestUtil to reliably surfice this kind of failure that was previously sporadic	2022-04-27 18:14:01 -07:00
Gautam Worah	8d9a333fac	LUCENE-10525 Improve WindowsFS emulation to catch invalid file names (#829 ) * Add filename checks for WindowsFS * don't delegate Path default methods, which makes it easier for subclassing. Also fix delegation bug (endsWith was calling startsWith).	2022-04-27 09:52:47 -04:00

... 2 3 4 5 6 ...

36097 Commits All Branches Search

36097 Commits

All Branches