lucene

mirror of https://github.com/apache/lucene.git synced 2025-02-12 04:55:27 +00:00

Author	SHA1	Message	Date
Julie Tibshirani	607b10dc2a	LUCENE-10069: Document that kNN queries might not return all results (#434 ) Performing a kNN search with very large k may return fewer than k documents. This is due to the fact that the HNSW graph is not guaranteed to be connected. This commit documents the behavior as part of a general warning that the results of a kNN search may be approximate.	2021-11-12 14:20:09 -08:00
Julie Tibshirani	68be365283	LUCENE-10063: Fix score calculation in SimpleTextKnnVectorsFormat The method VectorSimilarityFunction#convertToScore already reverses the similarity, so we shouldn't reverse it again.	2021-11-11 11:36:50 -08:00
Julie Tibshirani	9c73562161	LUCENE-10228: Ensure PerFieldKnnVectorsFormat uses right format name (#432 ) Before when creating a KnnVectorsWriter for merging, we consulted the existing "PER_FIELD_SUFFIX_KEY" attribute to determine the format's per-field suffix. This isn't correct since we could be using a new codec (that produces different formats/ suffixes). This commit modifies TestPerFieldDocValuesFormat#testMergeUsesNewFormat to trigger the problem. Without the fix we it throws an error like "java.nio.file.FileAlreadyExistsException: File "_3_Lucene90HnswVectorsFormat_0.vem" was already written to."	2021-11-11 11:22:52 -08:00
Dawid Weiss	ff9ee28c60	LUCENE-10223: interval support in standard syntax parser (#429 )	2021-11-11 08:56:48 +01:00
Dawid Weiss	238cd5fd0c	LUCENE-10226: test target creates a weird folder (lazy property).	2021-11-09 08:38:42 +01:00
Dawid Weiss	ffe40d23e1	LUCENE-10222: Enable github precommit check workflow on branch_9x	2021-11-05 09:01:45 +01:00
Dawid Weiss	5de05f3556	LUCENE-10220: Add an utility method to get IntervalSource from analyzed text (or token stream) (#427 )	2021-11-05 08:58:37 +01:00
Uwe Schindler	6ccee3204f	UCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" source code attacks (#425 ) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com> # Conflicts: # gradle/validation/validate-source-patterns.gradle	2021-11-03 17:21:15 +01:00
Adrien Grand	5fa093bdba	Format javadocs of new versions in a way that Spotless is happy with.	2021-11-02 13:23:45 +01:00
Adrien Grand	713385004f	Add next minor version 9.1.0	2021-11-02 13:20:20 +01:00
Adrien Grand	cc2a31f2be	LUCENE-10103: Move CHANGES entry to correct version.	2021-11-02 10:35:55 +01:00
Bruno Roustant	63b9e603e6	LUCENE-10196: Improve IntroSorter with 3-ways partitioning.	2021-11-01 10:55:44 +01:00
Dawid Weiss	0544819b78	LUCENE-10200: store git revision in the release folder and read it back from buildAndPushRelease (#419 )	2021-11-01 09:29:06 +01:00
Dawid Weiss	1d152c5f67	LUCENE-10192: drop jars from binary distribution and an aggregate merge of related minor tasks.	2021-10-31 10:50:11 +01:00
Dawid Weiss	98b17952f9	LUCENE-10213: Use unicode escapes in message property files in Luke (remove hacks)	2021-10-31 10:41:54 +01:00
Dawid Weiss	ded915b29b	LUCENE-10192: Use modules instead of classpath for binary distribution testing.	2021-10-31 10:41:49 +01:00
Dawid Weiss	01839da593	LUCENE-10192: Adjust checks to the new binary file structure.	2021-10-31 10:41:42 +01:00
Dawid Weiss	d23f37d02d	LUCENE-10200: The branch does not have to be on origin remote. Replace this logic with a check whether the branch is up to date with the remote.	2021-10-31 10:41:32 +01:00
Dawid Weiss	6d8ea58ccd	LUCENE-10200: Rename pddl-10.txt to reference glove.	2021-10-31 10:41:16 +01:00
Dawid Weiss	7f7007966e	LUCENE-10192: No need for hacky classpath, add the log4j module to the root set. Automatic modules have access to all other modules by default.	2021-10-31 10:41:10 +01:00
Dawid Weiss	627ef4d469	LUCENE-9978: Integrate Luke with the binary release package.	2021-10-31 10:40:51 +01:00
Dawid Weiss	39d388330c	LUCENE-10192: Move the test framework to a separate top-level folder. I'm not even sure it really needs to be in the binary distribution but it is distinctively different from the rest of the modules.	2021-10-31 10:40:45 +01:00
Dawid Weiss	fda47a24f8	LUCENE-10192: Flatten the modules into a single jar folder to allow --module-path to be used. So much simpler.'	2021-10-31 10:40:23 +01:00
Dawid Weiss	bcdfc4c8c9	LUCENE-10192: drop third party jars from the binary distribution.	2021-10-31 10:36:33 +01:00
Michael Sokolov	84a4797d14	Apply query score conversion to vector similarities in SimpleTextKnnVectorReader	2021-10-30 21:26:17 -04:00
David Smiley	c2c215d3a8	LUCENE-10201: Upgrade Spatial4j to 0.8 (#409 ) Upgrading Spatial4j to 0.8 improving a varitety of minor things. See release notes: https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8 Test-only dependency on JTS is upgraded to 1.17 as well	2021-10-29 22:01:52 -04:00
Mike Drob	23256a30fa	Replace deprecated Gradle 7.2 properties (#417 )	2021-10-29 09:59:47 -05:00
Adrien Grand	53b40e0fb7	LUCENE-10145: Revert change to computeMinMax. This part of the change would call `ArrayUtil#getUnsignedComparator` on a length that is rarely 4 or 8. In such cases it's better to use `Arrays#compareUnsigned`.	2021-10-28 16:29:05 +02:00
Mike McCandless	512cad0e01	LUCENE-9673: fix IntBlockPool's slice allocator to actually grow properly with larger and larger slice-chained int[]; excise wasted RAM due to unused (overallocation) of int[] to track in-memory postings	2021-10-28 09:37:36 -04:00
Dawid Weiss	727c6b1e0b	LUCENE-10209: Temporarily comment out gradle validation.	2021-10-27 21:12:14 +02:00
Dawid Weiss	62eb9a809e	LUCENE-10200: remove unused dangling license exclusions. Add references to the remaining ones.	2021-10-27 20:40:39 +02:00
Julie Tibshirani	abd5ec4ff0	LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0 (#413 ) When the reader has no live docs, `KnnVectorQuery` can error out. This happens because `IndexReader#numDocs` is 0, and we end up passing an illegal value of `k = 0` to the search method. This commit removes the problematic optimization in `KnnVectorQuery` and replaces with a lower-level based on the total number of vectors in the segment.	2021-10-27 11:08:47 -07:00
Nik Everett	941df98c3f	LUCENE-10206 Implement O(1) count on query cache (#415 ) When we load a query into the query cache we always calculate the count of matching documents. This uses that count to power the new `O(1)` `Weight#count` method.	2021-10-27 10:20:10 +02:00
Dawid Weiss	1613355149	LUCENE-10163: update smoke tester - README inside lucene/ is no longer there in the source release.	2021-10-26 21:58:20 +02:00
Dawid Weiss	4329450392	LUCENE-10198: remove debug statement that crept in.	2021-10-26 21:33:19 +02:00
Dawid Weiss	fb6aaa7b2c	LUCENE-10199: drop binary .zip artifact. (#407 )	2021-10-26 21:21:30 +02:00
Dawid Weiss	08c0356664	LUCENE-10163: clean up and remove some old cruft in readme files. Move binary release only README.md to the distribution project so that it doesn't look weird in the source tree. (#406 )	2021-10-26 21:20:42 +02:00
Dawid Weiss	780846a732	LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies) (#405 ) Co-authored-by: balmukundblr <balmukund.mandal@intel.com>	2021-10-26 09:15:55 +02:00
Dawid Weiss	486141f0eb	LUCENE-9660: correct help/tests.txt.	2021-10-26 08:45:58 +02:00
Mayya Sharipova	2ed6e4aa78	LUCENE-10154 NumericLeafComparator to define getPointValues (#364 ) This patch adds getPointValues to NumericLeafComparatorsimilar how it has getNumericDocValues. Numeric Sort optimization with points relies on the assumption that points and doc values record the same information, as we substitute iterator over doc_values with one over points. If we override getNumericDocValues it almost certainly means that whatever PointValues NumericComparator is going to look at shouldn't be used to skip non-competitive documents. Returning null for pointValues in this case will force comparator NOT to use sort optimization with points, and continue with a traditional way of iterating over doc values.	2021-10-25 09:38:37 -04:00
Dawid Weiss	81f5b4d642	LUCENE-9660: add tests.neverUpToDate=true option which, by default, makes test tasks always execute. (#410 )	2021-10-25 14:51:11 +02:00
David Smiley	2719cf6630	LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default (#362 ) Co-authored-by: Animesh Pandey <apanimesh061@gmail.com>	2021-10-22 20:40:22 -04:00
Michael McCandless	e3151d6c7d	LUCENE-10093: fix conflicting test assert to match how TieredMergePolicy (TMP) works; improv TMP javadocs (#375 )	2021-10-21 09:23:17 -04:00
Adrien Grand	8b6c90eccd	LUCENE-10165: Fix test failures.	2021-10-21 11:32:10 +02:00
Adrien Grand	9e84b2fd41	LUCENE-10165: Implement Lucene90DocValuesProducer#getMergeInstance. (#374 ) This speeds up merging by returning doc values that perform faster when all doc IDs and values are consumed.	2021-10-21 08:41:47 +02:00
Nhat Nguyen	4c2692e897	Do not run testHighOrdsSortedSetDV with SimpleTextCodec (#403 ) Avoid running testHighOrdsSortedSetDV with SimpleTextCodec as it requires a lot of memory and the bug was with Lucene90 Codec.	2021-10-20 18:22:34 -04:00
Adrien Grand	3a11983de2	LUCENE-10189: Optimize flush of doc-value fields that are effectively single-valued. (#399 )	2021-10-20 19:05:40 +02:00
Adrien Grand	0e1f9fcf31	LUCENE-10193: Cut over more array access to VarHandles. (#402 ) LZ4 is interesting because it used to read data in little-endian order even though Directory APIs were big endian. So most calls to LZ4 in backward-codecs have been changed to change the endianness of the input/output.	2021-10-20 19:04:01 +02:00
Julie Tibshirani	6bb2bbcd6a	LUCENE-10146: Add note that dot product is preferred over cosine (#400 ) While VectorSimilarityFunction#COSINE is helpful when you need to preserve the original vectors, it is significantly slower than DOT_PRODUCT. This commit adds javadocs to COSINE explaining that dot product is the fastest option.	2021-10-20 09:50:25 -07:00
Jan Høydahl	5b8f0a5eb5	LUCENE-10174 Speed up 'pushLocal' by using uncompressed tar (#401 )	2021-10-20 14:41:24 +02:00

1 2 3 4 5 ...

35463 Commits