lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	1bf3cbc0b9	gradle 7.3.3 quick upgrade (#856 )	2022-04-29 21:04:16 +02:00
Dawid Weiss	47ca4bc21c	LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255.	2022-04-29 09:48:23 +02:00
Dawid Weiss	cb206196e4	LUCENE-10088: allow per-class override in HandleLimitFS. Bump the limit a bit for nightlies in TestIndexWriterMergePolicy. (#424 ). Suppress SimpleTextCodec on TestIndexWriterMergePolicy. (#851 )	2022-04-28 21:28:50 +02:00
Nhat Nguyen	62ebb22cc0	LUCENE-10518: Relax field consistency check for old indices (#842 ) This change relaxes the field consistency check for old indices as we didn't enforce that in the previous versions. This commit also disables the optimization that relies on the field consistency for old indices.	2022-04-28 14:19:24 -04:00
Julie Tibshirani	9ae2181be5	Fix rare failures in TestVectorUtil cosine tests If one of the vectors is zero, the cosine is not defined. This change makes sure the test vectors are non-zero.	2022-04-08 09:46:33 -07:00
Adrien Grand	ded9db7786	Add back-compat indices for 9.1.0.	2022-03-22 16:07:02 +01:00
Adrien Grand	53188e98e6	Add next bugfix version.	2022-03-22 15:57:31 +01:00
Adrien Grand	8a44234833	DOAP changes for release 9.1.0	2022-03-22 15:23:21 +01:00
Adrien Grand	1b890ab5f9	LUCENE-10473: Make tests a bit faster when running nightly. (#754 )	2022-03-21 10:38:18 +01:00
Julie Tibshirani	fcacd22a80	LUCENE-9905: Fix check in TestPerFieldKnnVectorsFormat#testMergeUsesNewFormat Before the assertion checked if two sets were equal, which resulted in rare failures. Now we use 'contains' from hamcrest matchers.	2022-03-19 21:10:34 -07:00
Julie Tibshirani	22a9e45f09	LUCENE-9614: Fix rare TestKnnVectorQuery failures Some of our checks relied on doc IDs corresponding to the order in which docs were passed to IndexWriter. This is fragile and sometimes resulted in failures. Now we check against an "id" field instead.	2022-03-18 15:23:27 -07:00
Luca Cavanna	9b4003236f	LUCENE-10472: Fix TestMatchAllDocsQuery#testEarlyTermination (#753 ) As part of #716 I moved the test to use a collector manager, but I forgot to update one of the assertions. We can't rely on totalHits being accurate when the search is executed my multiple threads and early terminated.	2022-03-18 18:49:44 +01:00
Adrien Grand	5b522487ba	LUCENE-10469: Fix score mode propagation in ConstantScoreQuery. (#750 )	2022-03-16 13:19:30 +01:00
Dawid Weiss	ea989fe8f3	LUCENE-10311: avoid division by zero on small sets.	2022-03-15 12:02:34 -07:00
Luca Cavanna	a6114b532a	Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… (#745 ) In LUCENE-10458 we identified a bug in the logic. We're reverting on the 9.1 branch to avoid holding up the release.	2022-03-14 13:41:29 -07:00
Dawid Weiss	a796e08b1f	LUCENE-10461: fix windows launch script for luke so that it works with integration tests AND actual command line. Cmd escaping rules and start command line is absolutely insane. (#743 )	2022-03-12 19:44:17 +09:00
Dawid Weiss	a3a058de6d	LUCENE-10459: Update smoke tester for 9.1 (#744 ) Add demo dependencies to third party modules. Add an IT that checks whether demo classes are loadable. Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com> Co-authored-by: Julie Tibshirani <julietibs@apache.org>	2022-03-11 10:46:35 -08:00
Julie Tibshirani	28f4baa511	Add missing 8.11.1 release to DOAP file	2022-03-09 14:25:00 -08:00
Mayya Sharipova	8f399572c9	LUCENE-10408 Test correction checksum (#734 ) Use double instead of float to test vector values checksum	2022-03-09 12:16:40 +00:00
Spyros Kapnissis	a033067246	LUCENE-10171: OpenNLPOpsFactory should directly cache DictionaryLemmatizer objects (#380 ) Instead of caching dictionary strings and building multiple redundant DictionaryLemmatizer objects. Co-authored-by: Michael Gibney <michael@michaelgibney.net>	2022-03-08 12:51:44 -05:00
Adrien Grand	b5fe307c6f	LUCENE-10311: Remove pop_XXX helpers from `BitUtil`. (#724 ) As @rmuir noted, it would be as simple and create less cognitive overhead to use `Long#bitCount` directly.	2022-03-05 18:39:16 +01:00
Adrien Grand	1818ae9de3	LUCENE-10453: Speed up euclidean distances. (#725 )	2022-03-05 18:33:30 +01:00
Adrien Grand	29282fa315	LUCENE-10455: CHANGES entry.	2022-03-05 18:29:36 +01:00
Chris Lu	ffdb246702	LUCENE-10455: IndexSortSortedNumericDocValuesRangeQuery should implement Weight#scorerSupplier(LeafReaderContext) (#729 )	2022-03-05 18:27:54 +01:00
Alan Woodward	5e539bc50d	LUCENE-10431: Don't include rewriteMethod in MTQ hash calculation (#727 ) BooleanQuery assumes that its children's hashcodes are stable, and has some assertions to this effect. This did not apply to MultiTermQuery, which has a mutable RewriteMethod member variable that was included in its hash calculation. Changing the rewrite method would change the hash, leading to assertion failures being tripped. This commit removes rewriteMethod from the hash calculation, meaning that the hashcode will be stable even under mutability.	2022-03-04 11:54:44 +00:00
Dawid Weiss	8f92ec157f	LUCENE-10447: always use utf8 for forked process encoding. Use the sa… (#717 )	2022-03-03 20:57:09 +01:00
Alan Woodward	63454b83ad	LUCENE-10431: Deprecate MultiTermQuery.setRewriteMethod() (#722 ) Allowing users to mutate MultiTermQuery can give rise to odd bugs, for example in wrapper queries such as BooleanQuery which lazily calculate their hashcodes and then cache the result. This commit deprecates the setRewriteMethod() method on MultiTermQuery, in preparation for removing it entirely, and adds constructor parameters to the various MTQ implementations as a preferred way to set the rewrite method.	2022-03-03 11:17:02 +00:00
Adrien Grand	2a6b2ca143	LUCENE-10002: Fix test failure. When IndexSearcher is created with a threadpool it becomes impossible to assert on the number of evaluated hits overall.	2022-03-03 10:09:35 +01:00
Adrien Grand	0d35e38b93	LUCENE-10428: Avoid infinite loop under error conditions. (#711 ) Co-authored-by: dblock <dblock@dblock.org>	2022-03-03 09:42:39 +01:00
Adrien Grand	bb10e62dff	LUCENE-10311: Make FixedBitSet#approximateCardinality faster (and actually approximate). (#710 ) This computes a pop count on a sample of the longs that back the bitset. Quick benchmarks suggest that this runs 5x-10x faster than `FixedBitSet#cardinality` depending on the length of the bitset.	2022-03-03 08:49:14 +01:00
Lu Xugang	1967942861	LUCENE-10450: IndexSortSortedNumericDocValuesRangeQuery could be rewrite to MatchAllDocsQuery (#720 )	2022-03-02 09:29:28 +01:00
Lu Xugang	c0d5022d5a	LUCENE-10439: update CHANGES.txt (#714 )	2022-03-02 09:25:12 +01:00
Anand	11e2fb8e0b	LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox (#446 )	2022-03-02 09:21:50 +01:00
Luca Cavanna	bfe7096565	LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716 ) In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.	2022-03-02 09:20:36 +01:00
Greg Miller	4c9c1c0746	LUCENE-10440: Mark TaxonomyFacets and FloatTaxonomyFacets as deprecated (#713 )	2022-03-01 06:02:05 -08:00
Lu Xugang	9497524cc2	LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715 )	2022-02-28 18:46:57 +01:00
Robert Muir	5972b495ba	LUCENE-10421: use Constant instead of relying upon timestamp (#686 )	2022-02-25 00:39:06 -05:00
Greg Miller	81ab1d6ab6	Remove TODO for LUCENE-9952 since that issue was fixed	2022-02-24 13:46:18 -08:00
Adrien Grand	d952b3a581	LUCENE-10382: Use `IndexReaderContext#id` to check reader identity. (#702 ) `KnnVectorQuery` currently uses the index reader's hashcode to make sure that the query it builds runs on the right reader. We had added `IndexContextReader#id` a while back for a similar purpose with `TermStates`, let's reuse it?	2022-02-24 13:38:13 +01:00
Adrien Grand	d4cb6d0a30	LUCENE-10408: Write doc IDs of KNN vectors as ints rather than vints. (#708 ) Since doc IDs with a vector are loaded as an int[] in memory, this changes the on-disk format of vectors to align with the in-memory representation by using ints instead of vints to represent doc IDs. This might make vectors a bit larger on disk, but also a bit faster to open. I made the same change to how we encode nodes on levels for the same reason.	2022-02-24 13:36:36 +01:00
Lu Xugang	6acf16a2e3	LUCENE-10439: Support multi-valued and multiple dimensions for count query in PointRangeQuery (#705 )	2022-02-24 10:13:21 +01:00
gf2121	ad48203b55	LUCENE-10417: Revert "LUCENE-10315" (#706 ) (#707 )	2022-02-24 16:57:35 +08:00
Julie Tibshirani	a3b136573f	LUCENE-10382: Fix testSearchWithVisitedLimit failures	2022-02-23 20:11:04 -08:00
Lu Xugang	5aab8a8e40	LUCENE-10435: add CHANGES.txt entry (#704 )	2022-02-23 15:42:13 -08:00
Julie Tibshirani	29d4adfe60	LUCENE-10382: Ensure kNN filtering works with other codecs (#700 ) The original PR that added kNN filtering support overlooked non-default codecs. This follow-up ensures that other codecs work with the new filtering logic: * Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader` and `Lucene90HnswVectorsReader` * Add a test `BaseKnnVectorsFormatTestCase` to cover this case * Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose assumptions don't hold when SimpleText is used This PR also clarifies the limit checking logic for `Lucene91HnswVectorsReader`. Now we always check the limit before visiting a new node, whereas before we only checked it in an outer loop.	2022-02-23 14:59:16 -08:00
Lu Xugang	701e40132b	LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery (#701 )	2022-02-23 18:13:28 +01:00
Julie Tibshirani	458fb1abed	LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699 ) Before we were using the default Lucene91 codec, so we weren't exercising the old format.	2022-02-23 08:23:54 -08:00
Ignacio Vera	fb8d79d96a	LUCENE-10437: Improve error message in the Tessellator for polygon with all points collinear (#703 ) Polygon tessellator throws a more informative error message when the provided polygon does not contain enough no-collinear points.	2022-02-23 13:53:37 +01:00
Tomoko Uchida	c22d6d09d9	Revert "LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori" This reverts commit `b2b3596466`.	2022-02-22 20:21:07 +09:00
Lu Xugang	ec4d20ac3c	LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery (#691 )	2022-02-21 07:09:42 +01:00

1 2 3 4 5 ...

35738 Commits All Branches Search

35738 Commits

All Branches