lucene

Commit Graph

Author	SHA1	Message	Date
expani	b833e731dc	Avoid write/read to files per unique sequence	2024-08-07 01:37:16 +05:30
expani	d751e60836	Using charset while reading file	2024-08-07 01:25:03 +05:30
expani	c01fb9b64a	Reading docId sequences from files	2024-08-07 01:13:03 +05:30
expani	4a02e648ff	Added comments	2024-08-07 00:15:03 +05:30
expani	f35605a6b1	Added teardown to properly cleanup files	2024-08-06 23:48:48 +05:30
expani	04361d35cf	Gradle Tidy	2024-08-06 20:08:33 +05:30
expani	4e8a1f288d	Refactoring	2024-08-06 19:58:12 +05:30
expani	9f3792bab8	Added License	2024-08-06 19:54:47 +05:30
expani	1ddd597708	Removed older benchmarks	2024-08-06 19:50:41 +05:30
expani	d09e79a78e	Added a new encoder	2024-08-06 19:42:33 +05:30
expani	d2122cd38e	Increasing iterations to 10	2024-08-06 18:51:20 +05:30
expani	e06bc3b430	Explicit use of NIOFSDirectory	2024-08-06 18:36:02 +05:30
expani	5034e04eae	Initial commit - DocId Benchmark	2024-08-06 18:22:52 +05:30
expani	9e209589e5	Minor refactor	2024-06-26 15:42:36 +05:30
expani	886810beda	Using 2 step bpv 21 encoder in docIdsWriter	2024-06-26 15:40:34 +05:30
expani	2376766515	Deleted single loop encoders	2024-06-21 11:11:49 +05:30
expani	11e41b4a5d	BPV 21 Encoding in DocIdsWriter and it's micro benchmark	2024-06-21 11:08:05 +05:30
zhouhui	057cbf3c86	Use getAndSet, getAndClear instead split operations. (#13507 )	2024-06-19 11:53:12 +02:00
Greg Miller	937c004eda	Fix global score update bug in MultiLeafKnnCollector (#13463 )	2024-06-18 18:34:15 -07:00
Michael Sokolov	27a3e71fa3	CHANGES entry for FlatVectorsFormat	2024-06-17 11:19:06 -04:00
Ignacio Vera	d29c57e50c	Fix backward codec test after introducing the doc values skip index (#13487 ) The introduction of the doc values skip index in #13449 broke the backward codec test as those codecs do not support it. This commit fix it by breaking up the base class for the tests.	2024-06-17 09:58:12 +02:00
Dawid Weiss	dc287862dd	Gradle build: cleanup of dependency resolution and consolidation of dependency versions (#13484 )	2024-06-17 09:49:21 +02:00
Stefan Vodita	8f50976c26	Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows (#12947 )	2024-06-15 18:36:21 +01:00
Michael Sokolov	487d24ae69	Expose FlatVectorsFormat (#13469 )	2024-06-13 19:38:24 -04:00
Ignacio Vera	048770205c	GITHUB#13449: Sparse index, optional skip list on top of doc values (#13449 ) Optional skip list on top of doc values which is exposed via the DocValuesSkipper abstraction. A new flag is added to FieldType.java that configures whether to create a "skip index" for doc values. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2024-06-13 10:17:50 +02:00
Pulkit Gupta	1c655823dd	Mark COSINE VectorSimilarity function as deprecated (#13473 )	2024-06-11 13:49:21 -04:00
Benjamin Trent	cfdc747cde	Adjust assertion check to not throw an NPE (#13479 )	2024-06-11 09:48:21 -04:00
Bruno Roustant	51e60f49f8	Add missing entry in changes.txt (#13431 )	2024-06-11 14:10:09 +02:00
Bruno Roustant	4e8fb2a9df	Optimize Japanese UserDictionary. (#13431 ) Replace TreeMap by a List of Match. Use compiled Pattern.	2024-06-11 10:02:58 +02:00
Dawid Weiss	06f86a5096	Silence odd test runner warnings after gradle upgrade (#13471 )	2024-06-10 11:31:40 +02:00
Stefan Vodita	fb94403e0f	Fix typo in SimpleSortedSetFacetsExample.java	2024-06-10 08:44:47 +00:00
Paul King	5e0f549185	Fix typo in StringValueFacetCountsExample.java (#13474 )	2024-06-10 09:25:27 +01:00
Hank Chang	18d48d422d	Add new test case "testGetLines" for lucene/core/analysis/WordlistLoader (#13419 )	2024-06-10 09:07:26 +01:00
Armin Braun	c7a7d48d65	Reduce the heap use of BKDReader instances (#13464 ) We consume a lot of memory for the `indexIn` slices. If `indexIn` is of type `MemorySegmentIndexInput` the overhead of keeping loads of slices around just for cloning is far higher than the extra 12b per reader this adds (the slice description alone often costs a lot). In a number of Elasticsearch example uses with high segment counts I investigated, this change would save up to O(GB) of heap.	2024-06-07 13:27:10 +02:00
Ignacio Vera	9f8e886702	Move entry in CHANGES.txt	2024-06-07 07:29:15 +02:00
Mayya Sharipova	512ff4ac92	MultiTermQuery return null for ScoreSupplier (#13454 ) MultiTermQuery return null for ScoreSupplier if there are no terms in an index that match query terms. With the introduction of PR #12156 we saw degradation in performance of bool queries where one of the mandatory clauses is a TermInSetQuery with query terms not present in the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which would shortcut the whole bool query. This PR adds ability for MultiTermQuery to return null for ScoreSupplier if a field doesn't contain any query terms. Relates to PR #12156	2024-06-06 14:43:59 -04:00
Benjamin Trent	39a7eabb6d	Add back-compat indices for 9.11.0	2024-06-06 13:53:37 -04:00
Benjamin Trent	51d8d7263d	Sync CHANGES for 9.11.0	2024-06-06 12:56:10 -04:00
Ignacio Vera	58ab5b7826	Merge related HashMaps in FieldInfos#FieldNumbers into one map (#13460 ) Merges all immutable attributes in FieldInfos.FieldNumbers into one hashmap saving memory when writing big indices. Fixes an exotic bug when calling clear where not all attributes were cleared.	2024-06-06 17:08:05 +02:00
Sanjay Dutt	d0d2aa274f	Removed Scorer#getWeight (#13440 ) If Caller requires Weight then they have to keep track of Weight with which Scorer was created in the first place instead of relying on Scorer. Closes #13410	2024-06-06 16:03:19 +02:00
Adrien Grand	d5aa88bd7e	Add test for ghost fields to BaseKnnVectorQueryTestCase. (#13455 )	2024-06-06 16:00:55 +02:00
panguixin	fe50e86e36	Implement Weight#count for vector values in the FieldExistsQuery (#13322 ) * implement Weight#count for vector values * add change log * apply review comment * apply review comment * changelog * remove null check	2024-06-05 15:02:51 -04:00
Adrien Grand	05b4639c0c	Add prefetching for doc values and norms. (#13411 ) This follows a similar approach as postings and only prefetches the first page of data. I verified that it works well for collectors such as `TopFieldCollector`, as `IndexSearcher` first pulls a `LeafCollector`, then a `BulkScorer` and only then starts feeding the `BulkScorer` into the `LeafCollector`. So the background I/O for the `LeafCollector` which will prefetch the first page of doc values and the background I/O for the `BulkScorer` will run in parallel.	2024-06-05 13:43:14 +02:00
Adrien Grand	846aa2f8c3	Use `ReadAdvice#NORMAL` on files that have a forward-only access pattern. (#13450 ) This applies to files where performing readahead could help: - Doc values data (`.dvd`) - Norms data (`.nvd`) - Docs and freqs in postings lists (`.doc`) - Points data (`.kdd`) Other files (KNN vectors, stored fields, term vectors) keep using a `RANDOM` advice.	2024-06-05 13:41:58 +02:00
Ioana Tagirta	e868b82045	Rewrite newSlowRangeQuery to MatchNoDocsQuery when upper > lower (#13425 )	2024-06-04 18:38:50 +02:00
Zhang Chao	801b822972	Avoid unnecessary memory allocation in PackedLongValues#Iterator (#13439 ) We always allocate a long array of page size for a new PackedLongValues#Iterator instance, which is not necessary when packing a small number of values. this is more evident in the scenario of high-frequency flush operations	2024-06-04 13:03:09 +08:00
Michael Sokolov	c132e95369	mention KnnVectorsFormat in o.a.l.codecs package javadocs (#13448 ) Co-authored-by: Michael Sokolov <sokolovm@amazon.com>	2024-06-03 11:27:41 -04:00
Adrien Grand	edd7747370	Add prefetching support to stored fields. (#13424 ) This adds `StoredFields#prefetch(int)`, which mostly delegates to `IndexInput#prefetch`. Callers can take advantage of this API to parallelize I/O across multiple stored documents by first calling `StoredFields#prefetch` on all doc IDs before calling `StoredFields#document` on all doc IDs. I added a cache of recently prefetched blocks to the default codec, in order to avoid prefetching the same block multiple times in a short period of time. This felt sensible given that doc ID reordering via recursive graph bisection or index sorting are likely to result in search results being clustered.	2024-06-03 09:25:23 +02:00
Zhang Chao	a6f920d989	Fix test failure on TestPoint#testEqualsAndHashCode (#13433 )	2024-06-03 10:43:01 +08:00
Benjamin Trent	a540027bde	Add new dynamic confidence interval configuration to scalar quantized format (#13445 ) When int4 scalar quantization was merged, it added a new way to dynamically calculate quantiles. However, when that was merged, I inadvertently changed the default behavior, where a null confidenceInterval would actually calculate the dynamic quantiles instead of doing the previous auto-setting to 1 - 1/(dim + 1). This commit formalizes the dynamic quantile calculate through setting the confidenceInterval to 0, and preserves the previous behavior for null confidenceIntervals so that users upgrading will not see different quantiles than they would expect.	2024-06-01 13:25:38 -04:00

1 2 3 4 5 ...

14505 Commits