lucene

mirror of https://github.com/apache/lucene.git synced 2025-02-11 20:45:27 +00:00

Author	SHA1	Message	Date
Robert Muir	2090ac4318	LUCENE-10579: fix smoketester backwards-check to not parse stdout (#903 ) This is very noisy, can contain gradle status updates, various other tests.verbose prints from other threads, you name it. It causes the check to be flaky, and randomly "miss" seeing a test that executed. Instead, let's look at the zip files. We can still preserve the essence of what the test wants to do, but without any flakiness.	2022-05-20 07:20:57 -04:00
Adrien Grand	5e9dfbed27	LUCENE-10574: Keep allowing unbalanced merges if they would reclaim lots of deletes. (#905 ) `TestTieredMergePolicy` caught this special case: if a segment has lots of deletes, we should still allow unbalanced merges.	2022-05-20 10:06:38 +02:00
Adrien Grand	8e777a1320	Fix precommit.	2022-05-19 09:49:11 +02:00
Adrien Grand	3960c16296	LUCENE-10574: More test failures. - MergeOnFlushMergePolicy doesn't try to avoid O(n^2) merges, so I'm disabling the test on it for now. - TestUpgradeIndexMergePolicy would sometimes wrap with a non-standard merge policy like the alcoholic merge policy, I forced it to wrap a TieredMergePolicy.	2022-05-19 09:35:17 +02:00
Adrien Grand	bf07d98f13	LUCENE-10574: Fix test failure. LogDocMergePolicy would previously always force-merge an index that has 10 segments of size 1 to 10, due to the min doc count. This is not the case anymore, but the test was assuming that such an index would get merged, so I fixed the test's expectations. Also changed the merge policy to keep working when RAM buffers are flushed in such a way that segments do not appear in decreasing size order using the same logic as LogMergePolicy.	2022-05-19 09:24:54 +02:00
Adrien Grand	4240159b44	LUCENE-10574: Fix test failure. If a LogByteSizeMergePolicy is used, then it might decide to not merge the two one-document segments if their on-disk sizes are too different. Using a LogDocMergePolicy addresses the issue as both segments are always considered the same size.	2022-05-18 23:33:08 +02:00
Adrien Grand	268d29b845	LUCENE-10574: Prevent pathological merging. (#900 ) This updates TieredMergePolicy and Log(Doc\|Size)MergePolicy to only ever consider merges where the resulting segment would be at least 50% bigger than the biggest input segment. While a merge that only grows the biggest segment by 50% is still quite inefficient, this constraint is good enough to prevent pathological O(N^2) merging.	2022-05-18 23:05:54 +02:00
Alan Woodward	ac2267035a	Add next minor version 9.2.0	2022-05-18 16:37:10 +01:00
Adrien Grand	62189b2e85	LUCENE-9409: Reenable TestAllFilesDetectTruncation. (#896 ) - Removed dependency on LineFileDocs to improve reproducibility. - Relaxed the expected exception type: any exception is ok. - Ignore rare cases when a file still appears to have a well-formed footer after truncation.	2022-05-18 15:52:55 +02:00
Tomoko Uchida	34446c40c4	LUCENE-10531: small follow-up for b911d1d47	2022-05-18 09:44:06 +09:00
Tomoko Uchida	b911d1d47c	LUCENE-10531: Add @RequiresGUI test group for GUI tests (#893 ) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>	2022-05-18 09:26:06 +09:00
Adrien Grand	e65c0c777b	LUCENE-9356: Change test to detect mismatched checksums instead of byte flips. (#876 ) This makes the test more robust and gives a good sense of whether file formats are implementing `checkIntegrity` correctly.	2022-05-17 14:29:51 +02:00
Alan Woodward	8921b23bcd	LUCENE-10575: Fix some visibility issues (#894 )	2022-05-16 14:25:36 +01:00
sunwq	de8a6998d7	LUCENE-10568: fix javadocs errors in IndexWriter.DocStats (#884 )	2022-05-16 13:34:29 +09:00
Tomoko Uchida	c577508630	correct pr number in changes	2022-05-16 10:38:31 +09:00
Uwe Schindler	24ae064234	Correct issue numbers	2022-05-15 17:48:09 +02:00
Uwe Schindler	fcc6de1a1f	Add Github PR/Issue numbers to CHANGES.txt	2022-05-15 11:19:32 +02:00
Uwe Schindler	7a8071c9d4	Detect CI builds and enable errorprone by default for those CI builds (#890 )	2022-05-14 20:49:50 +02:00
Rushabh Shah	694d797526	LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes (#883 ) Co-authored-by: Rushabh Shah <shahrs87@apache.org> Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>	2022-05-14 12:01:19 +09:00
Greg Miller	e01b65d284	CHANGES entry for LUCENE-10488	2022-05-13 16:02:57 -07:00
Yuting Gan	f0ec226167	LUCENE-10488: Optimize Facets#getTopDims in FloatTaxonomyFacets (#806 )	2022-05-13 15:54:41 -07:00
Yuting Gan	57f8cb2fd6	LUCENE-10488: Optimize Facets#getTopDims in IntTaxonomyFacets (#779 )	2022-05-13 15:54:31 -07:00
Yuting Gan	ef43242d77	LUCENE-10488: Optimized getTopDims in ConcurrentSSDVFacetCounts (#777 )	2022-05-13 15:54:18 -07:00
Julie Tibshirani	2cca0e8441	LUCENE-10564: Fix errorprone warning This slipped through in the original commit because we only enable errorprone on nightly runs.	2022-05-12 17:31:55 -07:00
Julie Tibshirani	802f5422c0	Add CHANGES entry for LUCENE-10564	2022-05-12 13:37:47 -07:00
Julie Tibshirani	3afc9fa966	LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage (#882 ) Before, it didn't update the estimated memory usage, so calls to ramBytesUsed could be totally off.	2022-05-12 13:29:07 -07:00
Mayya Sharipova	ea5c40686f	LUCENE-10527 Use 2maxConn for last layer in HNSW (#872 ) The original HNSW paper (https://arxiv.org/pdf/1603.09320.pdf) suggests to use a different maxConn for the upper layers vs. the bottom one (which contains the full neighborhood graph). Specifically, they suggest using maxConn=M for upper layers and maxConn=2M for the bottom. This patch ensures that we follow this recommendation and use maxConn=2*M for the bottom layer.	2022-05-12 15:22:25 -04:00
Adrien Grand	8f89db8048	LUCENE-10536: Slightly better compression of doc values' terms dictionaries. (#838 ) Doc values terms dictionaries keep the first term of each block uncompressed so that they can somewhat efficiently perform binary searches across blocks. Suffixes of the other 63 terms are compressed together using LZ4 to leverage redundancy across suffixes. This change improves compression a bit by using the first (uncompressed) term of each block as a dictionary when compressing suffixes of the 63 other terms. This helps with compressing the first few suffixes when there's not much context yet that can be leveraged to find duplicates.	2022-05-12 10:32:58 +02:00
zacharymorn	96036bca9f	LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833 )	2022-05-11 22:26:35 -07:00
Lu Xugang	a06460a538	LUCENE-10502: add changes entry (#881 )	2022-05-11 21:24:22 -04:00
Lu Xugang	6040d1648f	LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc (#880 ) Currently vector's all docs of all fields are fully loaded into memory (for sparse cases). This happens not only when we do vector search, but also when we open an index to load meta info for vector readers. This patch instead uses IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc mapping. Benefits are reduced memory usage, and faster loading of meta info for vector readers.	2022-05-11 13:18:10 -04:00
Adrien Grand	54595611ae	LUCENE-10496: CHANGES entry.	2022-05-11 11:38:44 +02:00
xiaoping	e49708e01d	LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction (#780 )	2022-05-11 11:32:07 +02:00
xiaoping	6c6bb00cec	LUCENE-10555: fix iteratorCost initial logic error (#878 )	2022-05-11 08:36:24 +02:00
Adrien Grand	8476ac1f6a	Fix rare test failures in TestSortOptimization. The skipping logic relies on the points index telling us by how much we can reduce the candidate set by applying a filter that only matches documents that compare better than the bottom value. Some randomized points formats have large numbers of points per leaf, and produce estimates of point counts for range queries that are way above the actual value, which in-turn doesn't enable skipping when we think it should. To avoid running into this corner case, this change forces the default codec on this test.	2022-05-10 17:16:42 +02:00
xiaoping	102483bc57	fix bkd test logic error and doc error (#863 )	2022-05-10 13:10:00 +02:00
xiaoping	f431511cb7	LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization when NumericLeafComparator#setScorer is called (#864 )	2022-05-10 13:07:41 +02:00
Robert Muir	3edfeb5eb2	LUCENE-10532: remove @Slow annotation (#832 ) Remove `@Slow` annotation, for more consistency with CI and local jobs. All tests can be fast!	2022-05-09 23:03:55 -04:00
Ramin ALirezaee	111d6b186e	LUCENE-10312: Add PersianStemmer (#540 ) Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>	2022-05-07 17:09:56 +09:00
Uwe Schindler	8aa4a56491	LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (#871 )	2022-05-06 16:49:56 +02:00
Alan Woodward	5f832c64bf	LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (#869 ) The method moved from DocValuesFieldExistsQuery to DocValuesIterator, but the latter is a package-private utility class, making it invisible to client code. This commit moves it back onto FieldExistsQuery, meaning that the upgrade path will be the same as for all other uses of DocValuesFieldExistsQuery.	2022-05-06 09:41:28 +01:00
Uwe Schindler	14dcc9c9ce	Disable liftbot, we have our own tools	2022-05-05 22:27:57 +02:00
Adrien Grand	26301898b2	LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860 ) The computation of the scaling factor has special cases for these two values, but the current logic is backwards.	2022-05-05 10:24:28 +01:00
Tomoko Uchida	a89c57f35f	Make CONTRIBUTING.md a bit more succinct (#866 )	2022-05-05 10:35:33 +09:00
Michael Sokolov	7fbaa63dd1	LUCENE-10504: KnnGraphTester to use KnnVectorQuery (#796 ) * LUCENE-10504: KnnGraphTester to use KnnVectorQuery	2022-05-04 18:22:48 -04:00
Mayya Sharipova	87255c117d	Add change line for LUCENE-9848	2022-05-04 14:22:31 -04:00
Mayya Sharipova	dc6a7f9468	LUCENE-9848 Sort HNSW graph neighbors for construction (#862 ) * LUCENE-9848 Sort HNSW graph neighbors for construction Sort HNSW graph neighbors when applying diversity criterion During HNSW graph construction, when a node has already a number of connections larger than maximum allowed (maxConn), we need to prune its connections using a diversity criteria to limit the number of connections to maxConn. Currently when we add reverse connections to already existing nodes, we don't keep them sorted. Thus later, when we apply diversity criteria we may prune not the worst most distant non-diverse nodes. This patch makes sure that neighbours connections are always sorted from best (closest) to worst (distant), and during the application of diversity criteria processes nodes from worst to best. This path does the following: - enhance NeighborArray to always keep neighbour nodes sorted according to their scores (in desc or asc order). Make NeighborArray aware in which order the nodes should be sorted. - make OnHeapHnswGraph aware of the order of similarity function - make HnswGraphBuilder apply diversity criteria from worst to best nodes - create Lucene90NeighborArray to keep the previous logic of NeighborArray for Lucene90Codec	2022-05-04 14:15:14 -04:00
Gautam Worah	c3d47507e9	LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (#853 )	2022-05-03 12:53:20 +09:00
Lu Xugang	fe9d26178d	LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (#859 ) * LUCENE-10552: KnnVectorQuery now includes filter in equals/ hashCode	2022-05-02 17:58:47 -04:00
Kevin Risden	7efac761f4	LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (#837 )	2022-05-02 13:13:45 -04:00

1 2 3 4 5 ...

35960 Commits