lucene

Commit Graph

Author	SHA1	Message	Date
Ignacio Vera	ab47db4fee	LUCENE-10437: Improve error message in the Tessellator for polygon with all points collinear (#703 ) Polygon tessellator throws a more informative error message when the provided polygon does not contain enough no-collinear points.	2022-02-23 13:51:44 +01:00
Tomoko Uchida	f8040d565f	LUCENE-10416: move changes entry to v10.0.0	2022-02-22 20:29:14 +09:00
Tomoko Uchida	c7602a425c	migrate to temurin (#697 )	2022-02-21 17:09:21 +09:00
Lu Xugang	36a2149d43	LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery (#691 )	2022-02-21 07:08:23 +01:00
Tomoko Uchida	76c9fd4e38	LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori	2022-02-20 21:39:03 +09:00
Tomoko Uchida	e7a29c4c4c	Remove deprecated constructors in Nori (#695 )	2022-02-20 17:40:45 +09:00
Tomoko Uchida	58fa95deea	LUCENE-10400: revise binary dictionaries' constructor in nori (#693 )	2022-02-20 16:16:56 +09:00
Julie Tibshirani	f0d17e94d9	LUCENE-10408: Fix vector values iteration bug (#690 ) Now that there is special logic to handle the dense case, we need to adjust some assertions in VectorValues#advance.	2022-02-18 11:36:22 -08:00
Julie Tibshirani	cdb74e155a	Temporarily mute TestKnnVectorQuery#testRandomWithFilter	2022-02-17 14:50:01 -08:00
Julie Tibshirani	8ca372573d	LUCENE-10382: Support filtering in KnnVectorQuery (#656 ) This PR adds support for a query filter in KnnVectorQuery. First, we gather the query results for each leaf as a bit set. Then the HNSW search skips over the non-matching documents (using the same approach as for live docs). To prevent HNSW search from visiting too many documents when the filter is very selective, we short-circuit if HNSW has already visited more than the number of documents that match the filter, and execute an exact search instead. This bounds the number of visited documents at roughly 2x the cost of just running the exact filter, while in most cases HNSW completes successfully and does a lot better. Co-authored-by: Joel Bernstein <jbernste@apache.org>	2022-02-17 11:35:25 -08:00
Vigya Sharma	c132bbf677	LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677 ) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc.	2022-02-17 11:20:06 -08:00
Greg Miller	00029f1ec4	Add CHANGES entry for LUCENE-10398	2022-02-17 09:26:11 -08:00
spike.liu	fc3c790ab4	LUCENE-10398: Add static method for getting Terms from LeafReader (#678 ) Co-authored-by: chengliu@ctrip.com <chengliu@ctrip.com>	2022-02-17 09:21:51 -08:00
Mayya Sharipova	f8c5408be7	LUCENE-10408 Better encoding of doc Ids in vectors (#649 ) Better encoding of doc Ids in Lucene91HnswVectorsFormat for a dense case where all docs have vectors. Currently we write doc Ids of all documents that have vectors not very efficiently. This improve their encoding by for a case when all documents have vectors, we don't write document IDs, but just write a single short value – a dense marker.	2022-02-17 11:34:42 +01:00
Ignacio Vera	84e34dc468	LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (#685 ) These query wrappers do not modify the set of matching documents so they can delegate Weight#count.	2022-02-17 08:03:47 +01:00
Gautam Worah	dd25fabb03	LUCENE-10378 Implement Weight#count for PointRangeQuery (#658 ) Implement Weight#count for PointRangeQuery to provide a faster way to calculate the number of matching range docs when each doc has at-most one point and the points are 1-dimensional.	2022-02-16 07:23:49 +01:00
Patrick Zhai	6157854523	LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order (#630 )	2022-02-15 10:52:40 -08:00
Uwe Schindler	70c152bf32	LUCENE-10420: Remove deprecated interfaces and methods in IOUtils in main (#680 )	2022-02-14 17:05:34 +01:00
Tomoko Uchida	db8fcb84bb	LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces (#673 ) Co-authored-by: Uwe Schindler <uschindler@apache.org>	2022-02-15 00:12:28 +09:00
Dawid Weiss	8aa4763070	LUCENE-10419: fix rat thread safety bug.	2022-02-13 18:43:13 +01:00
Dawid Weiss	a861ff8df2	LUCENE-10419: revert debugging changes.	2022-02-13 18:34:57 +01:00
Dawid Weiss	50b7e2970f	LUCENE-10419: more debugging code. The message from AbstractStringBuilder suggests a concurrency issue somewhere, but I just can't see it!	2022-02-12 20:22:49 +01:00
Dawid Weiss	21c5b42063	LUCENE-10419: upgrade rat to 0.13.	2022-02-10 17:37:06 +01:00
Tomoko Uchida	4cb55a7e9c	trivial updates on github actions (#674 )	2022-02-11 01:13:18 +09:00
Luca Cavanna	ea170c9fab	Avoid SimpleText codec in TestIndexSortSortedNumericDocValuesRangeQuery (#675 ) The recently introduced testCount verifies that the Weight#count optimization kicks in. When SimpleText codec is used, `DocValues#unwrapSingleton` returns null which disables the optimization and makes the test fail.	2022-02-10 17:06:31 +01:00
Dawid Weiss	f6cebac333	LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (#668 )	2022-02-10 12:18:13 +01:00
Dawid Weiss	1f1da12c89	LUCENE-10419: add debugging code.	2022-02-10 12:03:54 +01:00
Adrien Grand	69d3a1d6af	LUCENE-10412: Improve handling of MatchNoDocsQuery in rewrites. (#664 )	2022-02-09 19:02:54 +01:00
Alan Woodward	2183756f1c	LUCENE-10413: Make default Ukrainian stopword set available (#665 ) This commit adds a new getDefaultStopwords() static method to UkrainianMorfologikAnalyzer, which makes it possible to create an analyzer with the default stop word set but a custom stem exclusion set.	2022-02-09 14:37:44 +00:00
Greg Miller	8178ffda00	LUCENE-10403: Add ArrayUtil#grow(T[]) (#644 )	2022-02-08 09:43:55 -08:00
Adrien Grand	ce93d45532	LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of matching clauses is a constant.	2022-02-08 17:25:53 +01:00
Nhat Nguyen	bcb70fd742	LUCENE-10190: Ensure changes are visible before advancing seqno (#640 ) DocumentWriter#anyChanges() can return false after we process and generate a sequence number for an update operation; but before we adjust the numDocsInRAM. In this window of time, refreshes are noop, although the maxCompletedSequenceNumber has advanced.	2022-02-08 10:29:20 -05:00
gf2121	5250186bd1	LUCENE-10410: Add more tests for legacy decoding logic in DocIdsWriter (#654 )	2022-02-08 16:59:32 +08:00
Tomoko Uchida	20f7f33c8d	LUCENE-10400: cleanup obsolete APIs in kuromoji (#655 )	2022-02-08 09:32:33 +09:00
Julie Tibshirani	eb5bdd7d15	Rename KnnGraphValues -> HnswGraph (#645 ) This PR proposes some renames to clarify the code structure. The top-level `KnnGraphValues` is renamed to `HnswGraph`, since it now represents a hierarchical graph. It's also moved from `org.apache.lucene.index` to the `hnsw` package. Other renames: * The old `HnswGraph` -> `OnHeapHnswGraph` * `IndexedKnnGraphValues` -> `OffHeapHnswGraph` (to match `OffHeapVectorValues`)	2022-02-07 13:21:15 -08:00
Tomoko Uchida	e7546c2427	LUCENE-10400: revise binary dictionaries' constructor in kuromoji (#643 )	2022-02-07 19:31:22 +09:00
gf2121	e93b08f471	LUCENE-10315: Add CHANGES for #541 (#653 )	2022-02-07 16:23:34 +08:00
gf2121	8c67a3816b	LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil (#541 )	2022-02-07 15:35:54 +08:00
Ignacio Vera	4c578017af	LUCENE-10405: binary and Sorted doc values are stored as BytesRef instead of BytesRefHash in memory index (#647 ) When using the MemoryIndex, binary and Sorted doc values are stored as BytesRef instead of BytesRefHash so they don't have a limit on size.	2022-02-07 07:33:07 +01:00
Greg Miller	deef3c704e	Update github hunspell regression test to use JDK 17 (#651 )	2022-02-06 08:00:31 -08:00
Gautam Worah	de4eccbb55	LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,CollectorManager) (#632 )	2022-02-04 15:25:52 -08:00
Mayya Sharipova	ff2189c477	Add changes item for LUCENE-10054	2022-02-04 14:51:48 -05:00
Mayya Sharipova	ea4ab26e52	LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#636 ) Update index.9.0.0-cfs.zip and index.9.0.0-nocfs.zip to include knn vector field.	2022-02-04 14:42:44 -05:00
Alan Woodward	6b64f4b556	LUCENE-10407: Set bpos flag to true when containing filter is exhausted (#648 ) ContainedByIntervalIterator and OverlappingIntervalIterator set their 'is the filter interval exhausted' flag to `false` once it has returned NO_MORE_POSITIONS on a document, so that subsequent calls to `startPosition()` will also return NO_MORE_POSITIONS. ContainingIntervalIterator omits to do this, and so it can incorrectly report matches, for example when used in a disjunction. This commit fixes that omission.	2022-02-04 16:44:57 +00:00
Alan Woodward	9ebee5a058	LUCENE-10402: Changes entry	2022-02-04 15:28:44 +00:00
Alan Woodward	e72d796e96	LUCENE-10402: Prefix interval automaton should be declared binary (#646 )	2022-02-04 15:27:03 +00:00
Adrien Grand	ed6c1b5aea	LUCENE-10401: Fix lookups on empty doc-values terms dictionaries. (#642 )	2022-02-04 09:28:35 +01:00
Julie Tibshirani	57d9515eff	LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls (#641 ) A couple of the data structures used in HNSW search are pretty large and expensive to allocate. This commit creates a shared candidates queue and visited set that are reused across calls to HnswGraph#searchLevel. Now the same data structures are used for building the entire graph, which can cut down on allocations during indexing. For graph building it also switches the visited set to FixedBitSet for better performance.	2022-02-03 16:00:09 -08:00
Luca Cavanna	bade484998	LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery (#635 ) IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.	2022-02-03 17:19:05 +01:00
Luca Cavanna	ee7a8d6918	LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639 ) Also use LuceneTestCase#newSearcher	2022-02-03 17:17:02 +01:00

... 3 4 5 6 7 ...

35959 Commits All Branches Search

35959 Commits

All Branches