Commit Graph

35393 Commits

Author SHA1 Message Date
Stefan Vodita 560f71b47d
LUCENE-10129: Add RamUsageEstimator.shallowSizeOf() for primitive arrays (#367)
Co-authored-by: Stefan Vodita <voditas@amazon.com>
2021-10-15 15:45:04 +02:00
Mayya Sharipova c9e56d27a3
LUCENE-10178 Add toString methond for Lucene90HnswVectorsFormat (#383)
All toString method to Lucene90HnswVectorsFormat for testing
and debugging.
2021-10-15 09:09:27 -04:00
Shintaro Murakami 7d5df2d6fe
Remove redundant null check (#378)
In commit method, mgr is already used without null check before this null check.
2021-10-15 07:38:46 -04:00
Mike Drob 95759d299e
Fix typo 2021-10-14 13:28:10 -05:00
Chris Hostetter f64c81c3f8 LUCENE-10173: remove max-worker restriction added by LUCENE-9488 when 'useGpg' in effect
Also update docs to remove the point of confusion that lead to thinking that restriction was useful
2021-10-14 10:50:16 -07:00
Tommaso Teofili cfd9f9f98f
LUCENE-10172 - minor java code improvements to Lucene Classification (#381)
* LUCENE-10172 - minor code improvements

* LUCENE-10172 - spotlessApply
2021-10-14 10:04:33 +02:00
Adrien Grand c36ce300ae
LUCENE-10170: Restore compression speed for LZ4. (#377)
A slowdown had been introduced in LUCENE-7521.
2021-10-14 08:21:15 +02:00
Jan Høydahl ae956db41c
LUCENE-9997 Revisit smoketester for 9.0 build (#355)
* LUCENE-9997 Revisit smoketester for 9.0 build

* Remove checkBrokenLinks

* Add back checkBrokenLinks

* Review feedback. Remove traces of solr-specific testNotice() method
Move backCompat test up to other "if isSrc" block

* Review feedback. Bring back the 'checkMaven()' method, as it checks lucene maven artifacts.
But since we dont have pom template files anymore, no need to compare with templates

* Review feedback. Fix script compatibility by comparing against X.Y instead of X.Y.Z

* Review feedback. Remove unnecessary if lucene test
Convert some ant commands to gradle

* Update MANIFEST tests to match the gradle-produced manifest

* LUCENE-10107 Read multi-line commit from Manifest
Backport from branch_8x

* Collapse for project in 'lucene' loops and methods taking 'project' as argument
Disable checkJavadocLinks, as this dependency no longer exists in 'scripts' folder

* Review feedback - fix more ant stuff, convert to gradle equivalent

* Review feedback: Refactor file open

* Comment out javadoc generation - was only used to check broken links?

* Fix charset of gpg console output to always be utf-8
Fix two more places to use with open()

* Accept 'LICENSE' without txt or md suffix in top-level

* Disable vector dictionary abuse exception if started with -Dsmoketester

* Reformat code

* Use -Dsmoketester flag when invoking IndexFiles
2021-10-13 15:24:14 +02:00
Patrick Zhai 6a41bc6310
LUCENE-10103 Make QueryCache respect Accountable queries (#346) 2021-10-13 09:10:09 -04:00
Dawid Weiss 8bcc3dc430
LUCENE-9488: rewrite distribution assembly, signing and checksum generation (#372) 2021-10-13 11:50:58 +02:00
Dawid Weiss dad926ad17
LUCENE-10167: Run tests on PRs (and pushes to the main branch) (#376) 2021-10-12 15:19:34 +02:00
Alan Woodward ca073c98fa
LUCENE-10140: Correct minimizing iterator sub-matches (#370)
Some interval iterators will attempt to minimize themselves by moving
sub-iterators forward until they are no longer positioned within the 
current match.  This causes problems when we try and pull Matches
for these iterators, as their sub-iterators are now out of position.  We
have previously tried to deal with this by introducing caching iterators
that check to see if they have been moved beyond the end of the current
interval, but this fails in cases where an interval can contain multiple
copies of a particular iterator.

This commit adds a the ability for minimizing iterators to signal to their
children when a prospective match has been found, so that they can
cache their positions and offsets.

Co-authored-by: Nikolay Khitrin <khitrin@gmail.com>
2021-10-12 09:33:36 +01:00
Robert Muir f67dec1739
LUCENE-10164: lucene/replicator should only have jetty as a test dependency (#373) 2021-10-11 13:53:58 -04:00
Julie Tibshirani f4861159c3
LUCENE-10146: Add VectorSimilarityFunction.COSINE (#366)
This PR adds support for using cosine similarity with kNN vector fields.

It takes a simple approach and doesn't attempt optimizations like normalizing
the query vector in advance, or performing loop unrolling. The thinking is that
users who prioritize efficiency can normalize all vectors in advance and use
`VectorSimilarityFunction.DOT_PRODUCT`.
2021-10-11 08:49:19 -07:00
jimczi ed69f6080f Update CHANGES entry for 8.10.1 2021-10-11 11:13:58 +02:00
Uwe Schindler c94aca7e5d
LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing (#369) 2021-10-11 00:25:40 +02:00
Mayya Sharipova 6f232b6f4b Add CHANGES entry for 8.10.1 2021-10-10 07:43:08 -04:00
Robert Muir c1fe9efb4b
LUCENE-10160: improve assert to be easier to debug
Instead of a vague: java.lang.AssertionError at..., include some basic
information:

java.lang.AssertionError: size=16252835,limit=15728640,maxSegmentSizeMb=10.0
2021-10-09 12:33:29 -04:00
Robert Muir 6c6a3bd5bd
LUCENE-10155: Refactor TestMultiMMap into a BaseChunkedDirectoryTestCase (#360)
BaseChunkedDirectoryTestCase is an extension of BaseDirectoryTestCase
where the concrete test class instantiates with a specified chunk size.
It then tries to test boundary conditions around all the chunking.
2021-10-09 11:55:41 -04:00
Robert Muir 61c15c8c10
LUCENE-10150: override readLongs() in ByteBuffersDataInput (#363)
Implement the bulk readLongs() with view buffers, consistent with how
readFloats() is implemented today.

This method is important for traversing the postings lists (PFOR
decompression), and is also used for block metadata in the stored fields
decompression.
2021-10-09 11:54:17 -04:00
Dawid Weiss a613021ca4
LUCENE-10136: allow 'var' declarations in source code (be reasonable though). (#368) 2021-10-08 20:20:22 +02:00
Michael Sokolov 9b1fc0ecc8
LUCENE-10147: ensure that KnnVectorQuery scores are positive (#361) 2021-10-07 14:09:48 -04:00
Robert Muir ba75dc5e6b
LUCENE-10150: override ByteBuffersDataInput readLong/readInt/readShort
Optimize these relative-read methods to no longer read
one-byte-at-a-time.

This speeds up common scenarios such as reading postings from in-memory
directory / nrt-caching directory.
2021-10-06 17:33:17 -04:00
Adrien Grand 5511bcea05
LUCENE-10153: Speed up BKDWriter using VarHandles. (#357) 2021-10-06 19:16:19 +02:00
Alan Woodward 9e9c3bd249
LUCENE-9325: Make Sort final (#338)
Sort is used in all sorts of settings where we assume that it is immutable
(for example, in IndexWriterConfig). This commit makes it so, plus it also
updates the severely outdated javadoc.
2021-10-06 17:13:24 +01:00
Jan Høydahl b20ffa5b2b
LUCENE-10152 Fix sha512 file syntax (#356) 2021-10-06 14:10:26 +02:00
Adrien Grand feac4cd09e
LUCENE-10182: No longer check dvGen. (#350)
`dvGen` doesn't need to be checked for schema consistency since it is always
-1. Furthermore, this change changes the `assertSame` that takes an object to
make it take an enum instead, since it uses instance equality checks which are
generally incorrect for objects.
2021-10-06 11:49:13 +02:00
Jan Høydahl 674b66dd16
LUCENE-9809 Adapt Release Wizard to only release Lucene (#344)
* Update wording in README and poll-mirrors.py
* First pass at updating wizard
- lucene/solr -> lucene
- removed solr-only tasks and python functions
* Update addVersion to remove Solr parts
- fixes bug with a regex and missing String qualifier for gradle baseVersion
* buildAndPushRelease - remove solr parts
* githubPRs.py report on PRs from new lucene repo and lucene JIRA only
* update smokeTestRelease.py example in README.md (but not smokeTestRelease.py itself)
* remove Solr references in releasedJirasRegex.py
* Update releasedJirasRegex.py
* Add gpg release signing to buildAndPushRelease.py

Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
2021-10-05 23:33:59 +02:00
Jan Høydahl 5cd0d68a06
LUCENE-9488 Assemble source tar, with checksum and signing (#353)
Also see LUCENE-10152 for a future cleanup of this code
2021-10-05 22:30:45 +02:00
Robert Muir 321d274b79
Fix DataInput/Output/RandomAccessInput javadocs, MIGRATE.txt to document endianness
Better document these methods directly, mentioning endianness, linking
to appropriate varhandle constant, etc.

Add blurb to MIGRATE.txt to call out the switch to little-endian to
increase awareness.
2021-10-05 13:03:05 -04:00
Uwe Schindler 9e0f3758d2
LUCENE-10143: Delegate primitive writes in RateLimitedIndexOutput (#352) 2021-10-05 14:02:22 +02:00
Greg Miller 5d2a031159
LUCENE-10134: Add CHANGES entry (#351) 2021-10-04 16:04:39 -07:00
Nhat Nguyen 92a53d3601 LUCENE-10126: Add CHANGES entry 2021-10-04 15:44:11 -04:00
Nhat Nguyen 45e8f639b0 LUCENE-10119: Add CHANGES entry 2021-10-04 15:43:17 -04:00
Nhat Nguyen c18e623b9a LUCENE-10106: Add CHANGES entry 2021-10-04 15:42:55 -04:00
Adrien Grand 18fc6c1f3e
LUCENE-10145: Speed up byte[] comparisons using VarHandles. (#349) 2021-10-04 18:35:27 +02:00
Chris Hegarty 04fb8c059e
LUCENE-10118: Test fix
We need to collect messages in a thread-safe list, as we're writing from multiple
threads.
2021-10-04 12:46:30 +01:00
Dawid Weiss 2e57a40546
LUCENE-10139: ExternalRefSorter returns a covariant with a subtype of BytesRefIterator that is Closeable. (#340) 2021-10-04 09:21:09 +02:00
Robert Muir b4fcdd9770
LUCENE-10142: use a better RNG for HNSW vectors
This code makes extensive use of Random, but uses the old legacy
java.util.Random, which is slow. Swap in SplittableRandom for better
performance.
2021-10-02 15:23:28 -04:00
Robert Muir 3dee08a09a
LUCENE-10130: small optimizations to SparseFixedBitSet set() codepath
Don't spend so many cycles updating ramBytesUsed when setting each bit.
Avoid recomputing some shifts that the caller already computes.
2021-10-02 08:30:54 -04:00
Robert Muir d395435fa8
LUCENE-10130: HnswGraph could make use of a SparseFixedBitSet.getAndSet 2021-10-01 23:16:20 -04:00
Nhat Nguyen 5748743d91
LUCENE-10126: Re-introduce chunk scoring logic in tests (#331)
This commit re-introduces the chunk scoring logic in AssertingBulkScorer 
and enables it in TestSortOptimization.
2021-10-01 10:02:28 -04:00
goankur cb366d04d4
LUCENE-10134: Move initialization of liveDocs bits outside the constructor to avoid AssertionError (#345)
Co-authored-by: Ankur Goel <goankur@amazon.com>
2021-10-01 08:57:11 +02:00
Timothy Potter 4c97b9e3f2
LUCENE-10131: Add backcompat indices for 8.10 and add LUCENE_8_10_0 to Version (#343) 2021-09-30 14:58:23 -06:00
Dawid Weiss 4d0fabf53b LUCENE-9713: we don't need those symbol-escape checks. They're valid adoc and we don't produce PDFs. 2021-09-30 15:27:56 +02:00
Dawid Weiss 93c66e1400 LUCENE-9713: exclude .idea/ (sync with Solr's version). 2021-09-30 15:19:19 +02:00
Dawid Weiss 3aa0676194
LUCENE-9713: apply source validation to txt files outside of src/* folders. Fix offenders. (#339) 2021-09-30 15:13:42 +02:00
Dawid Weiss 1bb4554832
LUCENE-10135: Correct passage selector behavior for long matching snippets (#334) 2021-09-30 15:05:41 +02:00
Chris Hegarty 797cfbf477
LUCENE-10118: Improve CMS infostream messages (#337)
Expand the log message when CMS.MergeThread completes its merge operation, 
to include addition useful diagnostic information, like the total-bytes-written, 
the time taken, as well as rate limiter information. Also, while here, unify the 
thread start and end log output to help improve tracing.
2021-09-30 11:43:45 +01:00
Alan Woodward ca810e732d
LUCENE-10138: Use maven central to resolve third-party gradle plugins (#336)
The gradle plugin portal uses jcenter to resolve third-party plugins, which
can be flaky. This commit instructs gradle to look first in maven central,
and only use the plugin portal for gradle's own plugins.
2021-09-30 11:41:05 +01:00