lucene

Commit Graph

Author	SHA1	Message	Date
Robert Muir	f67dec1739	LUCENE-10164: lucene/replicator should only have jetty as a test dependency (#373 )	2021-10-11 13:53:58 -04:00
Julie Tibshirani	f4861159c3	LUCENE-10146: Add VectorSimilarityFunction.COSINE (#366 ) This PR adds support for using cosine similarity with kNN vector fields. It takes a simple approach and doesn't attempt optimizations like normalizing the query vector in advance, or performing loop unrolling. The thinking is that users who prioritize efficiency can normalize all vectors in advance and use `VectorSimilarityFunction.DOT_PRODUCT`.	2021-10-11 08:49:19 -07:00
jimczi	ed69f6080f	Update CHANGES entry for 8.10.1	2021-10-11 11:13:58 +02:00
Uwe Schindler	c94aca7e5d	LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing (#369 )	2021-10-11 00:25:40 +02:00
Mayya Sharipova	6f232b6f4b	Add CHANGES entry for 8.10.1	2021-10-10 07:43:08 -04:00
Robert Muir	c1fe9efb4b	LUCENE-10160: improve assert to be easier to debug Instead of a vague: java.lang.AssertionError at..., include some basic information: java.lang.AssertionError: size=16252835,limit=15728640,maxSegmentSizeMb=10.0	2021-10-09 12:33:29 -04:00
Robert Muir	6c6a3bd5bd	LUCENE-10155: Refactor TestMultiMMap into a BaseChunkedDirectoryTestCase (#360 ) BaseChunkedDirectoryTestCase is an extension of BaseDirectoryTestCase where the concrete test class instantiates with a specified chunk size. It then tries to test boundary conditions around all the chunking.	2021-10-09 11:55:41 -04:00
Robert Muir	61c15c8c10	LUCENE-10150: override readLongs() in ByteBuffersDataInput (#363 ) Implement the bulk readLongs() with view buffers, consistent with how readFloats() is implemented today. This method is important for traversing the postings lists (PFOR decompression), and is also used for block metadata in the stored fields decompression.	2021-10-09 11:54:17 -04:00
Dawid Weiss	a613021ca4	LUCENE-10136: allow 'var' declarations in source code (be reasonable though). (#368 )	2021-10-08 20:20:22 +02:00
Michael Sokolov	9b1fc0ecc8	LUCENE-10147: ensure that KnnVectorQuery scores are positive (#361 )	2021-10-07 14:09:48 -04:00
Robert Muir	ba75dc5e6b	LUCENE-10150: override ByteBuffersDataInput readLong/readInt/readShort Optimize these relative-read methods to no longer read one-byte-at-a-time. This speeds up common scenarios such as reading postings from in-memory directory / nrt-caching directory.	2021-10-06 17:33:17 -04:00
Adrien Grand	5511bcea05	LUCENE-10153: Speed up BKDWriter using VarHandles. (#357 )	2021-10-06 19:16:19 +02:00
Alan Woodward	9e9c3bd249	LUCENE-9325: Make Sort final (#338 ) Sort is used in all sorts of settings where we assume that it is immutable (for example, in IndexWriterConfig). This commit makes it so, plus it also updates the severely outdated javadoc.	2021-10-06 17:13:24 +01:00
Jan Høydahl	b20ffa5b2b	LUCENE-10152 Fix sha512 file syntax (#356 )	2021-10-06 14:10:26 +02:00
Adrien Grand	feac4cd09e	LUCENE-10182: No longer check dvGen. (#350 ) `dvGen` doesn't need to be checked for schema consistency since it is always -1. Furthermore, this change changes the `assertSame` that takes an object to make it take an enum instead, since it uses instance equality checks which are generally incorrect for objects.	2021-10-06 11:49:13 +02:00
Jan Høydahl	674b66dd16	LUCENE-9809 Adapt Release Wizard to only release Lucene (#344 ) * Update wording in README and poll-mirrors.py * First pass at updating wizard - lucene/solr -> lucene - removed solr-only tasks and python functions * Update addVersion to remove Solr parts - fixes bug with a regex and missing String qualifier for gradle baseVersion * buildAndPushRelease - remove solr parts * githubPRs.py report on PRs from new lucene repo and lucene JIRA only * update smokeTestRelease.py example in README.md (but not smokeTestRelease.py itself) * remove Solr references in releasedJirasRegex.py * Update releasedJirasRegex.py * Add gpg release signing to buildAndPushRelease.py Co-authored-by: Christine Poerschke <cpoerschke@apache.org>	2021-10-05 23:33:59 +02:00
Jan Høydahl	5cd0d68a06	LUCENE-9488 Assemble source tar, with checksum and signing (#353 ) Also see LUCENE-10152 for a future cleanup of this code	2021-10-05 22:30:45 +02:00
Robert Muir	321d274b79	Fix DataInput/Output/RandomAccessInput javadocs, MIGRATE.txt to document endianness Better document these methods directly, mentioning endianness, linking to appropriate varhandle constant, etc. Add blurb to MIGRATE.txt to call out the switch to little-endian to increase awareness.	2021-10-05 13:03:05 -04:00
Uwe Schindler	9e0f3758d2	LUCENE-10143: Delegate primitive writes in RateLimitedIndexOutput (#352 )	2021-10-05 14:02:22 +02:00
Greg Miller	5d2a031159	LUCENE-10134: Add CHANGES entry (#351 )	2021-10-04 16:04:39 -07:00
Nhat Nguyen	92a53d3601	LUCENE-10126: Add CHANGES entry	2021-10-04 15:44:11 -04:00
Nhat Nguyen	45e8f639b0	LUCENE-10119: Add CHANGES entry	2021-10-04 15:43:17 -04:00
Nhat Nguyen	c18e623b9a	LUCENE-10106: Add CHANGES entry	2021-10-04 15:42:55 -04:00
Adrien Grand	18fc6c1f3e	LUCENE-10145: Speed up byte[] comparisons using VarHandles. (#349 )	2021-10-04 18:35:27 +02:00
Chris Hegarty	04fb8c059e	LUCENE-10118: Test fix We need to collect messages in a thread-safe list, as we're writing from multiple threads.	2021-10-04 12:46:30 +01:00
Dawid Weiss	2e57a40546	LUCENE-10139: ExternalRefSorter returns a covariant with a subtype of BytesRefIterator that is Closeable. (#340 )	2021-10-04 09:21:09 +02:00
Robert Muir	b4fcdd9770	LUCENE-10142: use a better RNG for HNSW vectors This code makes extensive use of Random, but uses the old legacy java.util.Random, which is slow. Swap in SplittableRandom for better performance.	2021-10-02 15:23:28 -04:00
Robert Muir	3dee08a09a	LUCENE-10130: small optimizations to SparseFixedBitSet set() codepath Don't spend so many cycles updating ramBytesUsed when setting each bit. Avoid recomputing some shifts that the caller already computes.	2021-10-02 08:30:54 -04:00
Robert Muir	d395435fa8	LUCENE-10130: HnswGraph could make use of a SparseFixedBitSet.getAndSet	2021-10-01 23:16:20 -04:00
Nhat Nguyen	5748743d91	LUCENE-10126: Re-introduce chunk scoring logic in tests (#331 ) This commit re-introduces the chunk scoring logic in AssertingBulkScorer and enables it in TestSortOptimization.	2021-10-01 10:02:28 -04:00
goankur	cb366d04d4	LUCENE-10134: Move initialization of liveDocs bits outside the constructor to avoid AssertionError (#345 ) Co-authored-by: Ankur Goel <goankur@amazon.com>	2021-10-01 08:57:11 +02:00
Timothy Potter	4c97b9e3f2	LUCENE-10131: Add backcompat indices for 8.10 and add LUCENE_8_10_0 to Version (#343 )	2021-09-30 14:58:23 -06:00
Dawid Weiss	4d0fabf53b	LUCENE-9713: we don't need those symbol-escape checks. They're valid adoc and we don't produce PDFs.	2021-09-30 15:27:56 +02:00
Dawid Weiss	93c66e1400	LUCENE-9713: exclude .idea/ (sync with Solr's version).	2021-09-30 15:19:19 +02:00
Dawid Weiss	3aa0676194	LUCENE-9713: apply source validation to txt files outside of src/* folders. Fix offenders. (#339 )	2021-09-30 15:13:42 +02:00
Dawid Weiss	1bb4554832	LUCENE-10135: Correct passage selector behavior for long matching snippets (#334 )	2021-09-30 15:05:41 +02:00
Chris Hegarty	797cfbf477	LUCENE-10118: Improve CMS infostream messages (#337 ) Expand the log message when CMS.MergeThread completes its merge operation, to include addition useful diagnostic information, like the total-bytes-written, the time taken, as well as rate limiter information. Also, while here, unify the thread start and end log output to help improve tracing.	2021-09-30 11:43:45 +01:00
Alan Woodward	ca810e732d	LUCENE-10138: Use maven central to resolve third-party gradle plugins (#336 ) The gradle plugin portal uses jcenter to resolve third-party plugins, which can be flaky. This commit instructs gradle to look first in maven central, and only use the plugin portal for gradle's own plugins.	2021-09-30 11:41:05 +01:00
Chris Hegarty	3e568b911f	Support addition of diagnostics by custom merge policies (#329 ) This commit adds a new `addDiagnostics` method to `SegmentInfo` that allows custom merge policies to add new diagnostic information to the segment's diagnostic map.	2021-09-30 09:50:22 +01:00
Dawid Weiss	0c13a52df5	Correct test error that allowed an empty array.	2021-09-30 09:17:29 +02:00
Dawid Weiss	d2b88b7a0b	LUCENE-10134: clean up the test from leaking threads and resources if an error occurs somewhere - this obscures the original cause of the problem.	2021-09-30 09:14:58 +02:00
Mayya Sharipova	88b264a368	LUCENE-10126 Add extra test on _doc sort (#326 ) Add extra test on _doc sort to test that search with after collects all documents	2021-09-29 14:49:16 -04:00
Adrien Grand	84e4050269	LUCENE-10125: Speed up DirectWriter. (#327 ) There was a regression introduced in https://github.com/apache/lucene/pull/107/files#diff-49b11ced76acedf749c5a5a0ff6e7fe93b8fb64caf8697e487a56f4f7adbb510 where we moved from write logic that was optimized for every number of bits per value to more general logic that had to work for every number of bits per value. This PR doesn't restore as much specialization, but some middle ground that makes flushes and merges of doc values noticeably faster (though not much faster).	2021-09-29 19:21:14 +02:00
Nhat Nguyen	e56995d85e	LUCENE-10126: Remove chunk scoring in AssertingBulkScorer Many tests are failing due to the newly introduced chunk scoring in AssertingBulkScorer. This commit reverts that change and will reintroduce it later.	2021-09-28 22:21:17 -04:00
Nhat Nguyen	cb153886eb	LUCENE-10126: Fix AssertingBulkScorer AssertingBulkScorer can generate a backward sub-range.	2021-09-28 19:50:40 -04:00
Timothy Potter	a73848cfab	DOAP changes for release 8.10.0	2021-09-28 13:28:45 -06:00
Nhat Nguyen	5ab900e10b	LUCENE-10126: Fix competitiveIterator wrongly skip documents (#324 ) The competitive iterator can wrongly skip a document that is advanced but not collected in the previous scoreRange.	2021-09-28 15:26:30 -04:00
Adrien Grand	9f80b4d8fb	LUCENE-10125: Speed up computation of exceptions. (#322 ) Even though it was not the driver for the slowdown, in LUCENE-10125 we identified that the move to PFOR had slowed down indexing significantly for fields indexed with indexOptions=DOCS. This patch gets some of the peformance back by using the `LongHeap` that we introduced for vectors instead of sorting the same array over and over again. On the NYC Taxis benchmark, I observed ~8% faster merges of postings with this change.	2021-09-28 17:25:56 +02:00
Adrien Grand	8f3f2ea4ab	LUCENE-10127: Minor speedup to doc values writes. (#325 ) This reduces a bit the overhead of writing doc values. On the NYC Taxis benchmark this resulted in ~10% faster merges for doc values.	2021-09-28 17:23:09 +02:00
Robert Muir	6ac311068f	LUCENE-10128: avoid costly reflection in SparseFixedBitSet ctor Seems that VectorFormat merge creates A LOT of these bitsets. We don't need to do any fancy reflection here via shallowSizeOf(Object), when we can call sizeOf(long[]) which is fast. We may want to revisit this RAMUsageEstimator api in the future to prevent traps like this.	2021-09-28 09:39:36 -04:00

... 3 4 5 6 7 ...

35581 Commits All Branches Search

35581 Commits

All Branches