Commit Graph

36650 Commits

Author SHA1 Message Date
Francisco Fernández Castaño 57201aa967
Add IntField, LongField, FloatField and DoubleField (#11997)
This commit adds new IndexableFields that index both points and doc
values at once.

Closes #11199
2022-12-20 18:19:46 +01:00
Benjamin Trent 1412e559d9
Clean up KNN related backward-codecs changes (#12019) 2022-12-20 14:04:42 +01:00
Robert Muir 3ac71adbdf
Ban use of Math.fma across the entire codebase (#12014)
When FMA is not supported by the hardware, these methods fall back to
BigDecimal usage which causes them to be 2500x slower.

While most hardware in the last 10 years may have the support, out of
box both VirtualBox and QEMU don't pass thru FMA support (for the latter
at least you can tweak it with e.g. -cpu host or similar to fix this).

This creates a terrible undocumented performance trap. Prevent it from
sneaking into our codebase.
2022-12-17 08:01:22 -05:00
Andriy Redko 945d7fe027
Upgrade ANTLR to version 4.11.1 (#12016)
Drop 3.x compatibility (which was pickier at compile-time and prevented slow things from happening). Instead add paranoia to runtime tests, so that they fail if antlr would do something slow in the parsing. This is needed because antlrv4 is a big performance trap: https://github.com/antlr/antlr4/blob/master/doc/faq/general.md

"Q: What are the main design decisions in ANTLR4?
Ease-of-use over performance. I will worry about performance later."

It allows us to move forward with newer antlr but hopefully prevent the associated headaches.

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Robert Muir <rmuir@apache.org>
2022-12-15 22:40:35 -05:00
Craig Taverner 3e8ef57e3f
Fix flat polygons incorrectly containing intersecting geometries (#12022) 2022-12-15 14:56:09 +01:00
Benjamin Trent 11f2bc2056
Fix SimpleTextKnnVectorsReader to handle changes introduced in GITHUB#12004 (#12024) 2022-12-15 14:49:47 +01:00
Benjamin Trent 72968d30ba
Move byte vector queries into new KnnByteVectorQuery (#12004) 2022-12-14 09:53:10 +01:00
Robert Muir 9eeab8c4a6
Remove deprecated API in 10.x (#11998) 2022-12-13 10:32:15 -05:00
Robert Muir 47f8c1baa2
Migrate away from per-segment-per-threadlocals on SegmentReader (#11998)
Add new stored fields and termvectors interfaces: IndexReader.storedFields()
and IndexReader.termVectors(). Deprecate IndexReader.document() and IndexReader.getTermVector().
The new APIs do not rely upon ThreadLocal storage for each index segment, which can greatly
reduce RAM requirements when there are many threads and/or segments.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2022-12-13 09:10:21 -05:00
Ignacio Vera ef5766aa81
Fix algorithm that chooses the bridge between a polygon and a hole (#11988) 2022-12-13 10:16:53 +01:00
Dawid Weiss 486003833f
Run spotless after javac (#12012) (#12015) 2022-12-13 08:42:04 +01:00
Robert Muir 06f9179295
Enable LongDoubleConversion error-prone check (#12010) 2022-12-12 20:55:39 -05:00
Greg Miller e34234ca6c
Remove unnecessary NaN checks from LongRange#verifyAndEncode (#12008) 2022-12-11 12:55:21 -08:00
Greg Miller 8671e29929
Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery (#12003)
* Leverage DISI static factory methods more over custom DISI impl where possible.
* Assert points field is a single-dim.
* Bound cost estimate by the cost of the doc values field (for sparse fields).
2022-12-10 12:23:31 -08:00
gf2121 54e00df7f6
Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries (#12006) 2022-12-11 02:30:17 +08:00
gf2121 9ff989ec00
Use ByteArrayComparator to replace Arrays#compareUnsigned in some other places (#11880) 2022-12-08 23:51:08 +08:00
Alan Woodward 66127f6e69
Add support for stored fields to MemoryIndex (#11999) 2022-12-08 09:56:24 +00:00
Adrien Grand a971120d05
Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API. (#11964)
`RandomAccessVectorValues` is internally used in our HNSW implementation to
provide random access to vectors, both at index and search time. In order to
better reflect this, this change does the following:
 - `RandomAccessVectorValues` moves to `org.apache.lucene.util.hnsw`.
 - `BufferingKnnVectorsWriter` no longer has a dependency on
   `RandomAccessVectorValues` and moves to `org.apache.lucene.codecs` since
   it's more of a utility class for KNN vector file formats than an index API.
   Maybe we should think of moving it near each file format that uses it
   instead.
 - `SortingCodecReader` no longer has a dependency on
   `RandomAccessVectorValues`.

Closes #10623
2022-12-08 08:49:37 +01:00
Adrien Grand 95df7e8109
Generalize range query optimization on sorted indexes to descending sorts. (#11972)
This generalizes #687 to indexes that are sorted in descending order. The main
challenge with descending sorts is that they require being able to compute the
last doc ID that matches a value, which would ideally require walking the BKD
tree in reverse order, but the API only support moving forward. This is worked
around by maintaining a stack of `PointTree` clones to perform the search.
2022-12-08 08:38:53 +01:00
Benjamin Trent d0be9ab57c
GITHUB-11830 Better optimize storage for vector connections (#11860) 2022-12-07 08:51:54 +01:00
Karl David Wright 108462a005 Followup work for #11883 2022-12-03 08:07:10 -05:00
Costin Leau 4eba6a1284
Add exponential growth to TimeLimitingBulkScorer (#11984)
Increase the timeout check inside TimeLimitBulkScorer at exponential rate.

Fix #11676
2022-12-02 09:20:48 -08:00
Dawid Weiss 1f741ff63c
Upgrade gradle to 7.6. (#11993) 2022-12-02 09:18:38 +01:00
Robert Muir fad3108b27
fix wrong serialization by ShapeDocValues (#11974)
Closes #11973
2022-12-01 20:32:42 -05:00
Robert Muir 0a9bb6e2ac
Disable useless error-prone checks (libraries/frameworks we do not use) (#11971)
These are easy/obvious ones to disable since we don't use the
functionality at all: the checks are literally useless.

This gives some performance boost to the error-prone, although it is
still pretty slow.

triage most of the previously disabled checks into TODO, noisy, etc
2022-12-01 08:46:23 -05:00
Alan Woodward 72ff140f5a
Don't let merged passages push out lower-scoring ones (#11990)
PassageScorer uses a priority queue of size maxPassages to keep track of
which highlighted passages are worth returning to the user. Once all
passages have been collected, we go through and merge overlapping
passages together, but this reduction in the number of passages is not
compensated for by re-adding the highest-scoring passages that were pushed
out of the queue by passages which have been merged away.

This commit increases the size of the priority queue to try and account for
overlapping passages that will subsequently be merged together.
2022-12-01 12:25:29 +00:00
Luca Cavanna bd168ac2a8 Add changes entry for #11985 2022-11-30 10:13:39 +01:00
Luca Cavanna 343d888b30
ExitableTerms to override getMin and getMax (#11985)
ExitableTerms should not iterate through the terms to retrieve min and max when the wrapped implementation has the values cached (e.g. FieldsReader, OrdsFieldReader)
2022-11-30 10:06:31 +01:00
Alan Woodward 0cc6f69536
Give OffsetsRetrievalStrategy implementations public constructors (#11983)
OffsetsFromMatchIterator and OffsetsFromPositions both have package-
private constructors, which makes them difficult to use as components in a
separate highlighter implementation.
2022-11-28 16:22:46 +00:00
Karl David Wright 5c4896321d Merge branch 'GITHUB-11883' into main
Pulling in changes to address ticket 11883.
2022-11-25 16:32:02 -05:00
Karl David Wright 74e8b94796 Fix for 11883. 2022-11-25 16:17:18 -05:00
Karl David Wright 6dc6b5b0dd As part of GITHUB-11883, develop new primitive Plane constructors to build boundary planes specific for each polygon edge. 2022-11-25 14:56:38 -05:00
Greg Miller 2e83c3b40f
Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type (#11950) 2022-11-25 11:38:41 -08:00
Robert Muir 4e93f29318
fix bad shift amounts and enable check (#11979) 2022-11-25 11:47:25 -05:00
Robert Muir 545c93a394
fix use of wrong array toString() method in test, enable check (#11978) 2022-11-25 11:47:04 -05:00
Robert Muir 4885b5f856
fix use of wrong array equals() method in test, enable check (#11977) 2022-11-25 11:46:48 -05:00
Robert Muir f4286493d1
fix variable assigned to itself in test and enable check (#11980) 2022-11-25 11:45:45 -05:00
Karl David Wright b5f94b6754 Add test that tweaks identical planes in intersections bug 2022-11-25 07:40:45 -05:00
Karl David Wright b5dd71198d Refactor, restoring isWithinSection and making sure it is properly called. 2022-11-24 02:47:06 -05:00
Shubham Chaudhary b15ace46b2
Remove QueryTimeout#isTimeoutEnabled method and move check to caller (#11954)
Co-authored-by: Shubham <cshbha@amazon.com>
2022-11-24 16:37:20 +01:00
Adrien Grand 28576eb99d Fix precommit. 2022-11-24 11:44:21 +01:00
Simon Cooper 135f3fab41
Ensure collections are properly sized on creation (#11942)
A few other optimisations along the way
2022-11-24 11:20:04 +01:00
Karl David Wright 839dfb5a2d More refactoring work, and fix a distance calculation. 2022-11-23 23:36:15 -05:00
Karl David Wright 5e4623af1f For 11965, add structural changes that would allow intersection calls to also be O(log(n)). Disabled though because test failures are the result of enabling it - work ongoing. 2022-11-23 15:07:57 -05:00
Robert Muir d3fe435be6
Invert error-prone configuration to be allow-list vs deny-list (#11970)
This does not change the semantics or performance of our setup.

Instead, it explicitly enables checks that we want vs disabling checks
that we don't want.

Also reordered checks to match the error-prone website list of checks
for easier maintenance.

It is now clear that many useless checks are enabled, we can disable
some of them and try to get the performance reasonable.
2022-11-23 14:53:27 -05:00
Karl David Wright 482f8251ff More work related to 11965: Improve performance of nearestDistance queries somewhat by removing unnecessary code. 2022-11-23 12:21:38 -05:00
Dawid Weiss 326142c485
Add a note about gradle checks being possibly a subset of all validation checks. (#11966) 2022-11-23 12:19:28 +01:00
Adrien Grand 802774641a
Enforce VectorValues.cost() is equal to size(). (#11962)
`VectorValues` have a `cost()` method that reports an approximate number of
documents that have a vector, but also a `size()` method that reports the
accurate number of vectors in the field. Since KNN vectors only support
single-valued fields we should enforce that `cost()` returns the `size()`.
2022-11-23 11:05:00 +01:00
Adrien Grand 469547e909
No longer announce releases on general@l.a.o. (#11967)
This mailing-list is deprecated.
2022-11-23 10:55:58 +01:00
Adrien Grand 20c1ba5d9a
Remove VectorValues#EMPTY. (#11961)
This instance is illegal as it reports a number of dimensions equal to zero.
2022-11-23 10:52:12 +01:00