lucene

mirror of https://github.com/apache/lucene.git synced 2025-02-08 11:05:29 +00:00

Author	SHA1	Message	Date
Tomoko Uchida	e61958e4fd	links to github should be '/issues'	2022-08-27 11:54:20 +09:00
Dawid Weiss	4f7543725c	#11720 Upgrade randomizedtesting to 2.8.1 (#11721 )	2022-08-26 00:01:57 +02:00
Mike Drob	dbc7a9764a	Add Integer awareness to RamUsageEstimator.sizeOf (#11715 ) Additionally, update comments to reflect that we have not been VM cache-aware for a long time now.	2022-08-25 15:18:08 -05:00
Uwe Schindler	1d54299011	Fix classloading deadlock in analysis factories / AnalysisSPILoader initialization. This closes #11701 (#11718 )	2022-08-25 18:16:04 +02:00
Tomoko Uchida	53b1ce7504	update contributing guide for GH issue (#11716 )	2022-08-25 04:06:09 +09:00
Greg Miller	1529606763	Optimize TermInSetQuery for terms that match all docs in a segment (#1062 )	2022-08-23 08:37:44 -07:00
Michael Sokolov	8021c2db4e	Don't throw an exception for byte-encoded vectors in SimpleText codec	2022-08-22 08:29:58 -04:00
Julie Tibshirani	df67223497	Disable byte encoding in TestSimpleTextKnnVectorsFormat	2022-08-21 17:00:57 -07:00
Julie Tibshirani	653d2ebf71	Remove KnnVectorsFormat#currentVersion (#1077 ) These internal versions only make sense within a codec definition, and aren't meant to be exposed and compared across codecs. Since this method is only used in tests, we can move the check to the test classes instead.	2022-08-21 13:09:07 -07:00
Michael Sokolov	daa56d30f0	Fix TestHnswGraph rare failure	2022-08-20 17:26:50 -04:00
Michael Sokolov	0a58318e16	Fix for bad cast when sorting a KnnVectors index over BytesRef (#1074 )	2022-08-20 17:23:47 -04:00
Michael Sokolov	798c02dd70	fix VectorUtil.dotProductScore normalization (#1073 )	2022-08-20 09:15:38 -04:00
Michael Sokolov	60fa19d509	don't call BitSet.cardinality() more than needed (#1075 )	2022-08-20 08:40:50 -04:00
Michael Sokolov	f9680c6807	Add safety checks to KnnVectorField; fixed issue with copying BytesRef (#1076 )	2022-08-20 08:38:42 -04:00
Tomoko Uchida	9ae3498f82	add notes about labels' color code	2022-08-20 13:22:50 +09:00
Julie Tibshirani	8308688d78	LUCENE-9583: Remove RandomAccessVectorValuesProducer (#1071 ) This change folds the `RandomAccessVectorValuesProducer` interface into `RandomAccessVectorValues`. This reduces the number of interfaces and clarifies the cloning/ copying behavior. This is a small simplification related to LUCENE-9583, but does not address the main issue.	2022-08-19 18:04:05 -07:00
Yuting Gan	0914b537db	LUCENE-10644: Facets#getAllChildren testing should ignore child order (#1013 )	2022-08-18 10:38:49 -07:00
Julie Tibshirani	7912ed02c4	Move Lucene91HnswGraphBuilder to test folder It's only used in unit tests so it can live in the backwards_codecs tests.	2022-08-17 17:10:38 -07:00
Tomoko Uchida	8b3303b25f	.asf.yaml	2022-08-16 20:02:47 +09:00
Michael Sokolov	bc214d4958	standardize exception text for vector dimension mismatch (in SimpleText codec)	2022-08-13 13:12:11 -04:00
Nick Knize	543910d900	LUCENE-10654: Fix ShapeDocValue Bounding Box failure (#1066 ) The base spatial test case may create invalid self crossing polygons. These polygons are cleaned by the tessellator which may result in an inconsistent bounding box between the tessellated shape and the original, invalid, geometry. This commit fixes the shape doc value test case to compute the bounding box from the cleaned geometry instead of relying on the, potentially invalid, original geometry. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2022-08-12 10:54:22 -05:00
Ignacio Vera	fe8d11254a	LUCENE-10678: Fix potential overflow when computing the partition point on the BKD tree (#1065 ) We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.	2022-08-11 15:25:53 +02:00
Michael Sokolov	a693fe819b	LUCENE-10577: enable quantization of HNSW vectors to 8 bits (#1054 ) * LUCENE-10577: enable supplying, storing, and comparing HNSW vectors with 8 bit precision	2022-08-10 17:09:07 -04:00
Vigya Sharma	59a0917e25	Fix typo in PostingsReaderBase docstring (#948 ) * remove extra PostingsEnum from docstring * add ImpactsEnum to docstring	2022-08-09 16:20:51 -07:00
Nick Knize	d7fd48c950	LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape (#1017 ) Adds new doc value field to support LatLonShape and XYShape doc values. The implementation is inspired by ComponentTree. A binary tree of tessellated components (point, line, or triangle) is created. This tree is then DFS serialized to a variable compressed DataOutput buffer to keep the doc value format as compact as possible. DocValue queries are performed on the serialized tree using a similar component relation logic as found in SpatialQuery for BKD indexed shapes. To make this possible some of the relation logic is refactored to make it accessible to the doc value query counterpart. Note this does not support the following: * Multi Geometries or Collections - This will be investigated by exploring the addition of multi binary doc values. * General Geometry Queries - This will be added in a follow on improvement. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2022-08-09 12:51:45 -05:00
Tomoko Uchida	0eba72f625	disable GH issue	2022-08-09 15:26:55 +09:00
tang donghai	b08e34722d	LUCENE-10646: Add some comment on LevenshteinAutomata (#1016 ) * add Comment on Lev & pretty the toDot * use auto generate scripts to add comment * update checksum * update checksum * restore toDot * add removeDeadStates in levAutomata Co-authored-by: tangdonghai <tangdonghai@meituan.com>	2022-08-07 10:01:30 -04:00
Ignacio Vera	bd0718f071	LUCENE-10673: Improve check of equality for latitudes for spatial3d GeoBoundingBox (#1056 )	2022-08-04 06:47:27 +02:00
luyuncheng	34154736c6	LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data (#987 )	2022-08-01 18:34:41 +02:00
Adrien Grand	04e4f317cb	LUCENE-10629: Fix NullPointerException. I hit a NPE while running tests. `Weight#scorer` may return `null`, but not `Scorer#iterator`.	2022-08-01 14:13:22 +02:00
Shai Erera	7ac75135b9	[LUCENE-10629]: Add fast match query support to FacetSets (#1015 )	2022-07-31 07:50:03 +03:00
Dawid Weiss	f93e52e5bb	LUCENE-10669: The build should be more helpful when generated resources are touched (#1053 )	2022-07-30 20:45:32 +02:00
Adrien Grand	7c9d3cd6ff	LUCENE-10633: Fix handling of missing values in reverse sorts.	2022-07-29 21:36:35 +02:00
Kaival Parikh	1ad28a3136	LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932 ) Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be able to compare pre and post-filtering benchmarks. `filterSelectivity` expresses the selectivity of a filter as proportion of passing docs that are randomly selected. We store these in a FixedBitSet and use this to calculate true KNN as well as in HNSW search. In case of post-filter, we over-select results as `topK / filterSelectivity` to get final hits close to actual requested `topK`. For pre-filter, we wrap the FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.	2022-07-29 11:21:34 -07:00
Adrien Grand	eb7b7791ba	LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023 ) This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields by using postings to filter competitive documents.	2022-07-29 11:12:32 +02:00
iverase	e1d2005df4	Add back-compat indices for 9.3.0	2022-07-29 10:13:20 +02:00
iverase	52a41d702f	DOAP changes for release 9.3.0	2022-07-29 09:40:53 +02:00
Greg Miller	4ebc249dbc	Add #scoreSupplier support to DocValuesRewriteMethod along with singleton doc value opto (#1020 )	2022-07-28 11:12:21 -07:00
Shiming Li	bb752c774c	LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050 ) If there are multiple segments. KnnVectorQuery explain has a bug in locating the doc ID. This is because the doc ID in explain is the docBase without the segment. In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be added with the segment's docBase. Co-authored-by: Julie Tibshirani <julietibs@apache.org>	2022-07-28 10:31:49 -07:00
Adrien Grand	0ff987562a	LUCENE-10661: Move CHANGES entry to 9.4.	2022-07-27 16:20:20 +02:00
luyuncheng	107747f359	LUCENE-10661: Reduce memory copy in BytesStore (#1047 )	2022-07-27 16:17:08 +02:00
Weiming Wu	2cf12b8cdc	Cache decoded bytes for TFIDFSimilarity scorer. (#1042 ) Co-authored-by: Weiming Wu <wweiming@amazon.com>	2022-07-26 13:47:52 +02:00
tang donghai	94960a0aff	precompute maxlevel in LogMergePolicy (#1045 )	2022-07-26 13:42:32 +02:00
Mayya Sharipova	2efc204a39	LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults This test occasionally fails if knn search returns only 1 document in the index, as we have an assertion that returned doc IDs from sorted and unsorted index must be different. This patch ensures that we have many documents in the index, so that knn search always returns enough results.	2022-07-25 09:48:43 -04:00
Greg Miller	f943a57ebe	Fix another TestDisiPriorityQueue bug	2022-07-22 14:32:08 -07:00
Mayya Sharipova	bd06cebfc2	Add change log for LUCENE-10592	2022-07-22 12:14:58 -04:00
Mayya Sharipova	fdbb76a8d7	Add next minor version 9.3.0	2022-07-22 12:01:08 -04:00
Mayya Sharipova	ba4bc04271	LUCENE-10592 Build HNSW Graph on indexing (#992 ) Currently, when indexing knn vectors, we buffer them in memory and on flush during a segment construction we build an HNSW graph. As building an HNSW graph is very expensive, this makes flush operation take a lot of time. This also makes overall indexing performance quite unpredictable – some indexing operations return almost instantly while others that trigger flush take a lot of time. This happens because flushes are unpredictable and trigged by memory used, presence of concurrent searches etc. Building an HNSW graph as we index documents avoid these problems, as the load of HNSW graph construction is spread evenly during indexing. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-07-22 11:29:28 -04:00
Mayya Sharipova	bd360f9b3e	Create Lucene94 Codec and move Lucene92 to backwards_codecs (#1041 )	2022-07-22 10:04:10 -04:00
Michael Sokolov	6bdeb141b7	Revert "Create Lucene93 Codec and move Lucene92 to backwards_codecs (#924 )" This reverts commit f4f4a159b77ffca974c003aba5d6b33a3b40be97.	2022-07-21 12:52:42 -04:00

1 2 3 4 5 ...

36126 Commits