35468 Commits

Author SHA1 Message Date
Nhat Nguyen
4c2692e897
Do not run testHighOrdsSortedSetDV with SimpleTextCodec (#403)
Avoid running testHighOrdsSortedSetDV with SimpleTextCodec as it 
requires a lot of memory and the bug was with Lucene90 Codec.
2021-10-20 18:22:34 -04:00
Adrien Grand
3a11983de2
LUCENE-10189: Optimize flush of doc-value fields that are effectively single-valued. (#399) 2021-10-20 19:05:40 +02:00
Adrien Grand
0e1f9fcf31
LUCENE-10193: Cut over more array access to VarHandles. (#402)
LZ4 is interesting because it used to read data in little-endian order even
though Directory APIs were big endian. So most calls to LZ4 in backward-codecs
have been changed to change the endianness of the input/output.
2021-10-20 19:04:01 +02:00
Julie Tibshirani
6bb2bbcd6a
LUCENE-10146: Add note that dot product is preferred over cosine (#400)
While VectorSimilarityFunction#COSINE is helpful when you need to preserve the
original vectors, it is significantly slower than DOT_PRODUCT. This commit adds
javadocs to COSINE explaining that dot product is the fastest option.
2021-10-20 09:50:25 -07:00
Jan Høydahl
5b8f0a5eb5
LUCENE-10174 Speed up 'pushLocal' by using uncompressed tar (#401) 2021-10-20 14:41:24 +02:00
Adrien Grand
f13a400b9a
LUCENE-10187: Reduce DirectWriter's padding. (#398)
It would make us more likely to detect out-of-bounds access in the future.
2021-10-20 10:30:09 +02:00
Tomoko Uchida
54418cef45
LUCENE-9997: write release revision to system temp dir (#394) 2021-10-20 07:06:30 +09:00
Jan Høydahl
c77e9ddf93
LUCENE-9997 Second pass smoketester fixes for 9.0 (#391)
* Java17 fixes

* Add to error message that the unexpected file is in lucene/ folder

* Fix gpg command utf-8 output

* Add --no-daemon to all gradle calls, and skip clean

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-Authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2021-10-19 21:24:06 +02:00
Jan Høydahl
f5486d13e6
LUCENE-10174 BuildAndPushRelease additional improvements (#396) 2021-10-19 19:48:44 +02:00
Stefan Vodita
54c5a2ce28
LUCENE-10182: Order assertion parameters correctly (#397) 2021-10-19 16:29:46 +02:00
Adrien Grand
1448e4739b
LUCENE-10180: Avoid using lambdas in SegmentMerger. (#385) 2021-10-19 15:00:20 +02:00
Nhat Nguyen
8b68bf60c9
LUCENE-10159: Fix invalid access in sorted set dv (#389)
We introduced invalid accesses for sorted set doc values in LUCENE-9613. 
However, the issue has been unnoticed because the ordinals in doc values
tests aren't complex enough to use high packed bits, and the 3 padding
bytes make these invalid accesses perfectly fine. To reproduce this
issue, we need to use at least 20 bits per value for the ordinals.
2021-10-19 08:00:00 -04:00
Dawid Weiss
6c21862a55
LUCENE-10186: Include manifest and legalese in source and javadoc jars. (#395) 2021-10-19 10:04:42 +02:00
Dawid Weiss
e290f91bb2
LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379) 2021-10-19 09:45:49 +02:00
Mayya Sharipova
6f67e8287f Add back-compat indices for 8.10.1 2021-10-18 20:38:34 -04:00
Stefan Vodita
d9e3d99ec9
LUCENE-10182: Be specific about which sizeOf() is called; rename RamUsageTester.sizeOf to ramUsed (#386)
Co-authored-by: Stefan Vodita <voditas@amazon.com>
2021-10-19 00:13:32 +02:00
Mayya Sharipova
a93dfe93c9 Add bugfix version 8.10.1 2021-10-18 18:02:05 -04:00
Robert Muir
f8d431ae44
LUCENE-10185: pass --release 11 to ECJ linter, fix JDK 17 build (#393)
* LUCENE-10185: pass --release 11 to ECJ linter, fix JDK 17 build

Otherwise, new java releases such as JDK 18, JDK 19, ... may have even
more new deprecations, the build shouldn't fail in such cases.

Remove -source/-target now that we pass --release

Fix casting so ECJ understands it and creates correct call signature (UweSays: "It's ok. I know why it happens, but it's a bug in ECJ. The type safety is checked by the invokeexact")

Co-authored-by: Uwe Schindler <uschindler@apache.org>
2021-10-18 16:43:53 -04:00
Dawid Weiss
c4c3c3270e
LUCENE-9997: Collect signed maven artifacts if -Psign is passed. (#392)
* Collect signed maven artifacts if -Psign is passed.
* Configure signing using gpg across all projects.
2021-10-18 20:58:29 +02:00
Mayya Sharipova
41fe301a21 DOAP changes for release 8.10.1 2021-10-18 11:11:15 -04:00
Jan Høydahl
175a49e54a
LUCENE-10163 Move LICENSE and NOTICE file to top level (#388)
* Add changes entry, under a new "Build" headline
2021-10-18 01:24:11 +02:00
Tomoko Uchida
18c6010e0f
LUCENE-10163: Remove pointer to no longer exists file (#390) 2021-10-17 18:55:33 +09:00
Tomoko Uchida
03e8192674
Specify minimum required python version for dev scripts (#387) 2021-10-17 13:45:49 +09:00
Jan Høydahl
cdfa11b158
LUCENE-10174 Update buildAndPushRelease.py for new gradle build (#382)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2021-10-17 01:17:34 +02:00
Jan Høydahl
f38c401283
LUCENE-10179 No longer check for release status on mirrors (#384) 2021-10-15 20:25:29 +02:00
Stefan Vodita
560f71b47d
LUCENE-10129: Add RamUsageEstimator.shallowSizeOf() for primitive arrays (#367)
Co-authored-by: Stefan Vodita <voditas@amazon.com>
2021-10-15 15:45:04 +02:00
Mayya Sharipova
c9e56d27a3
LUCENE-10178 Add toString methond for Lucene90HnswVectorsFormat (#383)
All toString method to Lucene90HnswVectorsFormat for testing
and debugging.
2021-10-15 09:09:27 -04:00
Shintaro Murakami
7d5df2d6fe
Remove redundant null check (#378)
In commit method, mgr is already used without null check before this null check.
2021-10-15 07:38:46 -04:00
Mike Drob
95759d299e
Fix typo 2021-10-14 13:28:10 -05:00
Chris Hostetter
f64c81c3f8 LUCENE-10173: remove max-worker restriction added by LUCENE-9488 when 'useGpg' in effect
Also update docs to remove the point of confusion that lead to thinking that restriction was useful
2021-10-14 10:50:16 -07:00
Tommaso Teofili
cfd9f9f98f
LUCENE-10172 - minor java code improvements to Lucene Classification (#381)
* LUCENE-10172 - minor code improvements

* LUCENE-10172 - spotlessApply
2021-10-14 10:04:33 +02:00
Adrien Grand
c36ce300ae
LUCENE-10170: Restore compression speed for LZ4. (#377)
A slowdown had been introduced in LUCENE-7521.
2021-10-14 08:21:15 +02:00
Jan Høydahl
ae956db41c
LUCENE-9997 Revisit smoketester for 9.0 build (#355)
* LUCENE-9997 Revisit smoketester for 9.0 build

* Remove checkBrokenLinks

* Add back checkBrokenLinks

* Review feedback. Remove traces of solr-specific testNotice() method
Move backCompat test up to other "if isSrc" block

* Review feedback. Bring back the 'checkMaven()' method, as it checks lucene maven artifacts.
But since we dont have pom template files anymore, no need to compare with templates

* Review feedback. Fix script compatibility by comparing against X.Y instead of X.Y.Z

* Review feedback. Remove unnecessary if lucene test
Convert some ant commands to gradle

* Update MANIFEST tests to match the gradle-produced manifest

* LUCENE-10107 Read multi-line commit from Manifest
Backport from branch_8x

* Collapse for project in 'lucene' loops and methods taking 'project' as argument
Disable checkJavadocLinks, as this dependency no longer exists in 'scripts' folder

* Review feedback - fix more ant stuff, convert to gradle equivalent

* Review feedback: Refactor file open

* Comment out javadoc generation - was only used to check broken links?

* Fix charset of gpg console output to always be utf-8
Fix two more places to use with open()

* Accept 'LICENSE' without txt or md suffix in top-level

* Disable vector dictionary abuse exception if started with -Dsmoketester

* Reformat code

* Use -Dsmoketester flag when invoking IndexFiles
2021-10-13 15:24:14 +02:00
Patrick Zhai
6a41bc6310
LUCENE-10103 Make QueryCache respect Accountable queries (#346) 2021-10-13 09:10:09 -04:00
Dawid Weiss
8bcc3dc430
LUCENE-9488: rewrite distribution assembly, signing and checksum generation (#372) 2021-10-13 11:50:58 +02:00
Dawid Weiss
dad926ad17
LUCENE-10167: Run tests on PRs (and pushes to the main branch) (#376) 2021-10-12 15:19:34 +02:00
Alan Woodward
ca073c98fa
LUCENE-10140: Correct minimizing iterator sub-matches (#370)
Some interval iterators will attempt to minimize themselves by moving
sub-iterators forward until they are no longer positioned within the 
current match.  This causes problems when we try and pull Matches
for these iterators, as their sub-iterators are now out of position.  We
have previously tried to deal with this by introducing caching iterators
that check to see if they have been moved beyond the end of the current
interval, but this fails in cases where an interval can contain multiple
copies of a particular iterator.

This commit adds a the ability for minimizing iterators to signal to their
children when a prospective match has been found, so that they can
cache their positions and offsets.

Co-authored-by: Nikolay Khitrin <khitrin@gmail.com>
2021-10-12 09:33:36 +01:00
Robert Muir
f67dec1739
LUCENE-10164: lucene/replicator should only have jetty as a test dependency (#373) 2021-10-11 13:53:58 -04:00
Julie Tibshirani
f4861159c3
LUCENE-10146: Add VectorSimilarityFunction.COSINE (#366)
This PR adds support for using cosine similarity with kNN vector fields.

It takes a simple approach and doesn't attempt optimizations like normalizing
the query vector in advance, or performing loop unrolling. The thinking is that
users who prioritize efficiency can normalize all vectors in advance and use
`VectorSimilarityFunction.DOT_PRODUCT`.
2021-10-11 08:49:19 -07:00
jimczi
ed69f6080f Update CHANGES entry for 8.10.1 2021-10-11 11:13:58 +02:00
Uwe Schindler
c94aca7e5d
LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing (#369) 2021-10-11 00:25:40 +02:00
Mayya Sharipova
6f232b6f4b Add CHANGES entry for 8.10.1 2021-10-10 07:43:08 -04:00
Robert Muir
c1fe9efb4b
LUCENE-10160: improve assert to be easier to debug
Instead of a vague: java.lang.AssertionError at..., include some basic
information:

java.lang.AssertionError: size=16252835,limit=15728640,maxSegmentSizeMb=10.0
2021-10-09 12:33:29 -04:00
Robert Muir
6c6a3bd5bd
LUCENE-10155: Refactor TestMultiMMap into a BaseChunkedDirectoryTestCase (#360)
BaseChunkedDirectoryTestCase is an extension of BaseDirectoryTestCase
where the concrete test class instantiates with a specified chunk size.
It then tries to test boundary conditions around all the chunking.
2021-10-09 11:55:41 -04:00
Robert Muir
61c15c8c10
LUCENE-10150: override readLongs() in ByteBuffersDataInput (#363)
Implement the bulk readLongs() with view buffers, consistent with how
readFloats() is implemented today.

This method is important for traversing the postings lists (PFOR
decompression), and is also used for block metadata in the stored fields
decompression.
2021-10-09 11:54:17 -04:00
Dawid Weiss
a613021ca4
LUCENE-10136: allow 'var' declarations in source code (be reasonable though). (#368) 2021-10-08 20:20:22 +02:00
Michael Sokolov
9b1fc0ecc8
LUCENE-10147: ensure that KnnVectorQuery scores are positive (#361) 2021-10-07 14:09:48 -04:00
Robert Muir
ba75dc5e6b
LUCENE-10150: override ByteBuffersDataInput readLong/readInt/readShort
Optimize these relative-read methods to no longer read
one-byte-at-a-time.

This speeds up common scenarios such as reading postings from in-memory
directory / nrt-caching directory.
2021-10-06 17:33:17 -04:00
Adrien Grand
5511bcea05
LUCENE-10153: Speed up BKDWriter using VarHandles. (#357) 2021-10-06 19:16:19 +02:00
Alan Woodward
9e9c3bd249
LUCENE-9325: Make Sort final (#338)
Sort is used in all sorts of settings where we assume that it is immutable
(for example, in IndexWriterConfig). This commit makes it so, plus it also
updates the severely outdated javadoc.
2021-10-06 17:13:24 +01:00