* LUCENE-9997 Revisit smoketester for 9.0 build
* Remove checkBrokenLinks
* Add back checkBrokenLinks
* Review feedback. Remove traces of solr-specific testNotice() method
Move backCompat test up to other "if isSrc" block
* Review feedback. Bring back the 'checkMaven()' method, as it checks lucene maven artifacts.
But since we dont have pom template files anymore, no need to compare with templates
* Review feedback. Fix script compatibility by comparing against X.Y instead of X.Y.Z
* Review feedback. Remove unnecessary if lucene test
Convert some ant commands to gradle
* Update MANIFEST tests to match the gradle-produced manifest
* LUCENE-10107 Read multi-line commit from Manifest
Backport from branch_8x
* Collapse for project in 'lucene' loops and methods taking 'project' as argument
Disable checkJavadocLinks, as this dependency no longer exists in 'scripts' folder
* Review feedback - fix more ant stuff, convert to gradle equivalent
* Review feedback: Refactor file open
* Comment out javadoc generation - was only used to check broken links?
* Fix charset of gpg console output to always be utf-8
Fix two more places to use with open()
* Accept 'LICENSE' without txt or md suffix in top-level
* Disable vector dictionary abuse exception if started with -Dsmoketester
* Reformat code
* Use -Dsmoketester flag when invoking IndexFiles
Some interval iterators will attempt to minimize themselves by moving
sub-iterators forward until they are no longer positioned within the
current match. This causes problems when we try and pull Matches
for these iterators, as their sub-iterators are now out of position. We
have previously tried to deal with this by introducing caching iterators
that check to see if they have been moved beyond the end of the current
interval, but this fails in cases where an interval can contain multiple
copies of a particular iterator.
This commit adds a the ability for minimizing iterators to signal to their
children when a prospective match has been found, so that they can
cache their positions and offsets.
Co-authored-by: Nikolay Khitrin <khitrin@gmail.com>
This PR adds support for using cosine similarity with kNN vector fields.
It takes a simple approach and doesn't attempt optimizations like normalizing
the query vector in advance, or performing loop unrolling. The thinking is that
users who prioritize efficiency can normalize all vectors in advance and use
`VectorSimilarityFunction.DOT_PRODUCT`.
Instead of a vague: java.lang.AssertionError at..., include some basic
information:
java.lang.AssertionError: size=16252835,limit=15728640,maxSegmentSizeMb=10.0
BaseChunkedDirectoryTestCase is an extension of BaseDirectoryTestCase
where the concrete test class instantiates with a specified chunk size.
It then tries to test boundary conditions around all the chunking.
Implement the bulk readLongs() with view buffers, consistent with how
readFloats() is implemented today.
This method is important for traversing the postings lists (PFOR
decompression), and is also used for block metadata in the stored fields
decompression.
Optimize these relative-read methods to no longer read
one-byte-at-a-time.
This speeds up common scenarios such as reading postings from in-memory
directory / nrt-caching directory.
Sort is used in all sorts of settings where we assume that it is immutable
(for example, in IndexWriterConfig). This commit makes it so, plus it also
updates the severely outdated javadoc.
`dvGen` doesn't need to be checked for schema consistency since it is always
-1. Furthermore, this change changes the `assertSame` that takes an object to
make it take an enum instead, since it uses instance equality checks which are
generally incorrect for objects.
* Update wording in README and poll-mirrors.py
* First pass at updating wizard
- lucene/solr -> lucene
- removed solr-only tasks and python functions
* Update addVersion to remove Solr parts
- fixes bug with a regex and missing String qualifier for gradle baseVersion
* buildAndPushRelease - remove solr parts
* githubPRs.py report on PRs from new lucene repo and lucene JIRA only
* update smokeTestRelease.py example in README.md (but not smokeTestRelease.py itself)
* remove Solr references in releasedJirasRegex.py
* Update releasedJirasRegex.py
* Add gpg release signing to buildAndPushRelease.py
Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
Better document these methods directly, mentioning endianness, linking
to appropriate varhandle constant, etc.
Add blurb to MIGRATE.txt to call out the switch to little-endian to
increase awareness.
Expand the log message when CMS.MergeThread completes its merge operation,
to include addition useful diagnostic information, like the total-bytes-written,
the time taken, as well as rate limiter information. Also, while here, unify the
thread start and end log output to help improve tracing.
The gradle plugin portal uses jcenter to resolve third-party plugins, which
can be flaky. This commit instructs gradle to look first in maven central,
and only use the plugin portal for gradle's own plugins.
This commit adds a new `addDiagnostics` method to `SegmentInfo` that
allows custom merge policies to add new diagnostic information to the
segment's diagnostic map.
There was a regression introduced in
https://github.com/apache/lucene/pull/107/files#diff-49b11ced76acedf749c5a5a0ff6e7fe93b8fb64caf8697e487a56f4f7adbb510
where we moved from write logic that was optimized for every number of bits per
value to more general logic that had to work for every number of bits per value.
This PR doesn't restore as much specialization, but some middle ground that
makes flushes and merges of doc values noticeably faster (though not much
faster).