LZ4 is interesting because it used to read data in little-endian order even
though Directory APIs were big endian. So most calls to LZ4 in backward-codecs
have been changed to change the endianness of the input/output.
While VectorSimilarityFunction#COSINE is helpful when you need to preserve the
original vectors, it is significantly slower than DOT_PRODUCT. This commit adds
javadocs to COSINE explaining that dot product is the fastest option.
* Java17 fixes
* Add to error message that the unexpected file is in lucene/ folder
* Fix gpg command utf-8 output
* Add --no-daemon to all gradle calls, and skip clean
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-Authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
We introduced invalid accesses for sorted set doc values in LUCENE-9613.
However, the issue has been unnoticed because the ordinals in doc values
tests aren't complex enough to use high packed bits, and the 3 padding
bytes make these invalid accesses perfectly fine. To reproduce this
issue, we need to use at least 20 bits per value for the ordinals.
* LUCENE-10185: pass --release 11 to ECJ linter, fix JDK 17 build
Otherwise, new java releases such as JDK 18, JDK 19, ... may have even
more new deprecations, the build shouldn't fail in such cases.
Remove -source/-target now that we pass --release
Fix casting so ECJ understands it and creates correct call signature (UweSays: "It's ok. I know why it happens, but it's a bug in ECJ. The type safety is checked by the invokeexact")
Co-authored-by: Uwe Schindler <uschindler@apache.org>
* LUCENE-9997 Revisit smoketester for 9.0 build
* Remove checkBrokenLinks
* Add back checkBrokenLinks
* Review feedback. Remove traces of solr-specific testNotice() method
Move backCompat test up to other "if isSrc" block
* Review feedback. Bring back the 'checkMaven()' method, as it checks lucene maven artifacts.
But since we dont have pom template files anymore, no need to compare with templates
* Review feedback. Fix script compatibility by comparing against X.Y instead of X.Y.Z
* Review feedback. Remove unnecessary if lucene test
Convert some ant commands to gradle
* Update MANIFEST tests to match the gradle-produced manifest
* LUCENE-10107 Read multi-line commit from Manifest
Backport from branch_8x
* Collapse for project in 'lucene' loops and methods taking 'project' as argument
Disable checkJavadocLinks, as this dependency no longer exists in 'scripts' folder
* Review feedback - fix more ant stuff, convert to gradle equivalent
* Review feedback: Refactor file open
* Comment out javadoc generation - was only used to check broken links?
* Fix charset of gpg console output to always be utf-8
Fix two more places to use with open()
* Accept 'LICENSE' without txt or md suffix in top-level
* Disable vector dictionary abuse exception if started with -Dsmoketester
* Reformat code
* Use -Dsmoketester flag when invoking IndexFiles
Some interval iterators will attempt to minimize themselves by moving
sub-iterators forward until they are no longer positioned within the
current match. This causes problems when we try and pull Matches
for these iterators, as their sub-iterators are now out of position. We
have previously tried to deal with this by introducing caching iterators
that check to see if they have been moved beyond the end of the current
interval, but this fails in cases where an interval can contain multiple
copies of a particular iterator.
This commit adds a the ability for minimizing iterators to signal to their
children when a prospective match has been found, so that they can
cache their positions and offsets.
Co-authored-by: Nikolay Khitrin <khitrin@gmail.com>
This PR adds support for using cosine similarity with kNN vector fields.
It takes a simple approach and doesn't attempt optimizations like normalizing
the query vector in advance, or performing loop unrolling. The thinking is that
users who prioritize efficiency can normalize all vectors in advance and use
`VectorSimilarityFunction.DOT_PRODUCT`.
Instead of a vague: java.lang.AssertionError at..., include some basic
information:
java.lang.AssertionError: size=16252835,limit=15728640,maxSegmentSizeMb=10.0
BaseChunkedDirectoryTestCase is an extension of BaseDirectoryTestCase
where the concrete test class instantiates with a specified chunk size.
It then tries to test boundary conditions around all the chunking.
Implement the bulk readLongs() with view buffers, consistent with how
readFloats() is implemented today.
This method is important for traversing the postings lists (PFOR
decompression), and is also used for block metadata in the stored fields
decompression.
Optimize these relative-read methods to no longer read
one-byte-at-a-time.
This speeds up common scenarios such as reading postings from in-memory
directory / nrt-caching directory.
Sort is used in all sorts of settings where we assume that it is immutable
(for example, in IndexWriterConfig). This commit makes it so, plus it also
updates the severely outdated javadoc.