lucene

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	8086ef9f45	LUCENE-10455: CHANGES entry.	2022-03-05 18:32:41 +01:00
Adrien Grand	9d732380ae	LUCENE-10453: Speed up euclidean distances. (#725 )	2022-03-05 18:31:56 +01:00
Chris Lu	2700c6b525	LUCENE-10455: IndexSortSortedNumericDocValuesRangeQuery should implement Weight#scorerSupplier(LeafReaderContext) (#729 )	2022-03-05 18:27:29 +01:00
Alan Woodward	e049e426dd	LUCENE-10431: Remove MultiTermQuery.setRewriteMethod() (#726 )	2022-03-04 11:54:02 +00:00
Dawid Weiss	81ab1e598f	LUCENE-10447: always use utf8 for forked process encoding. Use the sa… (#717 )	2022-03-03 20:53:20 +01:00
Alan Woodward	3f994dec53	LUCENE-10431: Deprecate MultiTermQuery.setRewriteMethod() (#722 ) Allowing users to mutate MultiTermQuery can give rise to odd bugs, for example in wrapper queries such as BooleanQuery which lazily calculate their hashcodes and then cache the result. This commit deprecates the setRewriteMethod() method on MultiTermQuery, in preparation for removing it entirely, and adds constructor parameters to the various MTQ implementations as a preferred way to set the rewrite method.	2022-03-03 11:08:39 +00:00
Adrien Grand	bff4246476	LUCENE-10002: Fix test failure. When IndexSearcher is created with a threadpool it becomes impossible to assert on the number of evaluated hits overall.	2022-03-03 10:10:35 +01:00
Adrien Grand	44a2a82319	LUCENE-10428: Avoid infinite loop under error conditions. (#711 ) Co-authored-by: dblock <dblock@dblock.org>	2022-03-03 09:42:12 +01:00
Adrien Grand	ca73ed1c28	LUCENE-10311: Make FixedBitSet#approximateCardinality faster (and actually approximate). (#710 ) This computes a pop count on a sample of the longs that back the bitset. Quick benchmarks suggest that this runs 5x-10x faster than `FixedBitSet#cardinality` depending on the length of the bitset.	2022-03-03 08:48:44 +01:00
Peter Gromov	9ed526b70e	[hunspell] make SuggestionTimeoutException public to make it easier for custom checkCanceled implementations to throw it depending on their ad-hoc conditions and get partial results	2022-03-02 21:24:24 +01:00
Adrien Grand	46f9a25216	LUCENE-10237: Move CHANGES entry to 9.1.	2022-03-02 09:39:54 +01:00
Lu Xugang	e996f1d8e7	LUCENE-10450: IndexSortSortedNumericDocValuesRangeQuery could be rewrite to MatchAllDocsQuery (#720 )	2022-03-02 09:28:40 +01:00
Lu Xugang	e8e522a52b	LUCENE-10439: update CHANGES.txt (#714 )	2022-03-02 09:23:44 +01:00
Anand	14726dec51	LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox (#446 )	2022-03-02 09:19:25 +01:00
Luca Cavanna	1b083ea039	LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716 ) In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.	2022-03-02 09:14:36 +01:00
Greg Miller	51797dc7f1	LUCENE-10440: Reduce visibility of TaxonomyFacets and FloatTaxonomyFacets (#712 )	2022-03-01 06:02:40 -08:00
Lu Xugang	6224d0b157	LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715 )	2022-02-28 18:46:32 +01:00
Robert Muir	466278e149	LUCENE-10421: use Constant instead of relying upon timestamp (#686 )	2022-02-25 00:38:13 -05:00
Greg Miller	4af516a149	Remove TODO for LUCENE-9952 since that issue was fixed	2022-02-24 12:03:55 -08:00
Adrien Grand	d47ff38d70	LUCENE-10382: Use `IndexReaderContext#id` to check reader identity. (#702 ) `KnnVectorQuery` currently uses the index reader's hashcode to make sure that the query it builds runs on the right reader. We had added `IndexContextReader#id` a while back for a similar purpose with `TermStates`, let's reuse it?	2022-02-24 13:38:02 +01:00
Adrien Grand	44d7d962ae	LUCENE-10408: Write doc IDs of KNN vectors as ints rather than vints. (#708 ) Since doc IDs with a vector are loaded as an int[] in memory, this changes the on-disk format of vectors to align with the in-memory representation by using ints instead of vints to represent doc IDs. This might make vectors a bit larger on disk, but also a bit faster to open. I made the same change to how we encode nodes on levels for the same reason.	2022-02-24 13:36:10 +01:00
Lu Xugang	550d1305db	LUCENE-10439: Support multi-valued and multiple dimensions for count query in PointRangeQuery (#705 )	2022-02-24 10:13:03 +01:00
gf2121	b0ca227862	LUCENE-10417: Revert "LUCENE-10315" (#706 )	2022-02-24 16:41:17 +08:00
Julie Tibshirani	d9c2e46824	LUCENE-10382: Fix testSearchWithVisitedLimit failures	2022-02-23 19:56:38 -08:00
Lu Xugang	7ec89603e3	LUCENE-10435: add CHANGES.txt entry (#704 )	2022-02-23 15:41:02 -08:00
Julie Tibshirani	b40a750aa8	LUCENE-10382: Ensure kNN filtering works with other codecs (#700 ) The original PR that added kNN filtering support overlooked non-default codecs. This follow-up ensures that other codecs work with the new filtering logic: * Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader` and `Lucene90HnswVectorsReader` * Add a test `BaseKnnVectorsFormatTestCase` to cover this case * Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose assumptions don't hold when SimpleText is used This PR also clarifies the limit checking logic for `Lucene91HnswVectorsReader`. Now we always check the limit before visiting a new node, whereas before we only checked it in an outer loop.	2022-02-23 14:58:27 -08:00
Julie Tibshirani	4364bdd63e	LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699 ) Before we were using the default Lucene91 codec, so we weren't exercising the old format.	2022-02-23 08:22:59 -08:00
Lu Xugang	43e89d6a29	LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery (#701 )	2022-02-23 13:53:56 +01:00
Ignacio Vera	ab47db4fee	LUCENE-10437: Improve error message in the Tessellator for polygon with all points collinear (#703 ) Polygon tessellator throws a more informative error message when the provided polygon does not contain enough no-collinear points.	2022-02-23 13:51:44 +01:00
Tomoko Uchida	f8040d565f	LUCENE-10416: move changes entry to v10.0.0	2022-02-22 20:29:14 +09:00
Tomoko Uchida	c7602a425c	migrate to temurin (#697 )	2022-02-21 17:09:21 +09:00
Lu Xugang	36a2149d43	LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery (#691 )	2022-02-21 07:08:23 +01:00
Tomoko Uchida	76c9fd4e38	LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori	2022-02-20 21:39:03 +09:00
Tomoko Uchida	e7a29c4c4c	Remove deprecated constructors in Nori (#695 )	2022-02-20 17:40:45 +09:00
Tomoko Uchida	58fa95deea	LUCENE-10400: revise binary dictionaries' constructor in nori (#693 )	2022-02-20 16:16:56 +09:00
Julie Tibshirani	f0d17e94d9	LUCENE-10408: Fix vector values iteration bug (#690 ) Now that there is special logic to handle the dense case, we need to adjust some assertions in VectorValues#advance.	2022-02-18 11:36:22 -08:00
Julie Tibshirani	cdb74e155a	Temporarily mute TestKnnVectorQuery#testRandomWithFilter	2022-02-17 14:50:01 -08:00
Julie Tibshirani	8ca372573d	LUCENE-10382: Support filtering in KnnVectorQuery (#656 ) This PR adds support for a query filter in KnnVectorQuery. First, we gather the query results for each leaf as a bit set. Then the HNSW search skips over the non-matching documents (using the same approach as for live docs). To prevent HNSW search from visiting too many documents when the filter is very selective, we short-circuit if HNSW has already visited more than the number of documents that match the filter, and execute an exact search instead. This bounds the number of visited documents at roughly 2x the cost of just running the exact filter, while in most cases HNSW completes successfully and does a lot better. Co-authored-by: Joel Bernstein <jbernste@apache.org>	2022-02-17 11:35:25 -08:00
Vigya Sharma	c132bbf677	LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677 ) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc.	2022-02-17 11:20:06 -08:00
Greg Miller	00029f1ec4	Add CHANGES entry for LUCENE-10398	2022-02-17 09:26:11 -08:00
spike.liu	fc3c790ab4	LUCENE-10398: Add static method for getting Terms from LeafReader (#678 ) Co-authored-by: chengliu@ctrip.com <chengliu@ctrip.com>	2022-02-17 09:21:51 -08:00
Mayya Sharipova	f8c5408be7	LUCENE-10408 Better encoding of doc Ids in vectors (#649 ) Better encoding of doc Ids in Lucene91HnswVectorsFormat for a dense case where all docs have vectors. Currently we write doc Ids of all documents that have vectors not very efficiently. This improve their encoding by for a case when all documents have vectors, we don't write document IDs, but just write a single short value – a dense marker.	2022-02-17 11:34:42 +01:00
Ignacio Vera	84e34dc468	LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (#685 ) These query wrappers do not modify the set of matching documents so they can delegate Weight#count.	2022-02-17 08:03:47 +01:00
Gautam Worah	dd25fabb03	LUCENE-10378 Implement Weight#count for PointRangeQuery (#658 ) Implement Weight#count for PointRangeQuery to provide a faster way to calculate the number of matching range docs when each doc has at-most one point and the points are 1-dimensional.	2022-02-16 07:23:49 +01:00
Patrick Zhai	6157854523	LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order (#630 )	2022-02-15 10:52:40 -08:00
Uwe Schindler	70c152bf32	LUCENE-10420: Remove deprecated interfaces and methods in IOUtils in main (#680 )	2022-02-14 17:05:34 +01:00
Tomoko Uchida	db8fcb84bb	LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces (#673 ) Co-authored-by: Uwe Schindler <uschindler@apache.org>	2022-02-15 00:12:28 +09:00
Dawid Weiss	8aa4763070	LUCENE-10419: fix rat thread safety bug.	2022-02-13 18:43:13 +01:00
Dawid Weiss	a861ff8df2	LUCENE-10419: revert debugging changes.	2022-02-13 18:34:57 +01:00
Dawid Weiss	50b7e2970f	LUCENE-10419: more debugging code. The message from AbstractStringBuilder suggests a concurrency issue somewhere, but I just can't see it!	2022-02-12 20:22:49 +01:00

1 2 3 4 5 ...

35887 Commits All Branches Search

35887 Commits

All Branches