lucene

Commit Graph

Author	SHA1	Message	Date
Egor Potemkin	d18e3f1d45	Issue #11582 Update Faceting user guide (#12025 ) Update faceting user guide to modern times. Co-authored-by: Egor Potemkin <epotyom@amazon.com>	2022-12-21 12:20:18 -05:00
Francisco Fernández Castaño	57201aa967	Add IntField, LongField, FloatField and DoubleField (#11997 ) This commit adds new IndexableFields that index both points and doc values at once. Closes #11199	2022-12-20 18:19:46 +01:00
Benjamin Trent	1412e559d9	Clean up KNN related backward-codecs changes (#12019 )	2022-12-20 14:04:42 +01:00
Robert Muir	3ac71adbdf	Ban use of Math.fma across the entire codebase (#12014 ) When FMA is not supported by the hardware, these methods fall back to BigDecimal usage which causes them to be 2500x slower. While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this). This creates a terrible undocumented performance trap. Prevent it from sneaking into our codebase.	2022-12-17 08:01:22 -05:00
Andriy Redko	945d7fe027	Upgrade ANTLR to version 4.11.1 (#12016 ) Drop 3.x compatibility (which was pickier at compile-time and prevented slow things from happening). Instead add paranoia to runtime tests, so that they fail if antlr would do something slow in the parsing. This is needed because antlrv4 is a big performance trap: https://github.com/antlr/antlr4/blob/master/doc/faq/general.md "Q: What are the main design decisions in ANTLR4? Ease-of-use over performance. I will worry about performance later." It allows us to move forward with newer antlr but hopefully prevent the associated headaches. Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Co-authored-by: Robert Muir <rmuir@apache.org>	2022-12-15 22:40:35 -05:00
Craig Taverner	3e8ef57e3f	Fix flat polygons incorrectly containing intersecting geometries (#12022 )	2022-12-15 14:56:09 +01:00
Benjamin Trent	11f2bc2056	Fix SimpleTextKnnVectorsReader to handle changes introduced in GITHUB#12004 (#12024 )	2022-12-15 14:49:47 +01:00
Benjamin Trent	72968d30ba	Move byte vector queries into new KnnByteVectorQuery (#12004 )	2022-12-14 09:53:10 +01:00
Robert Muir	9eeab8c4a6	Remove deprecated API in 10.x (#11998 )	2022-12-13 10:32:15 -05:00
Robert Muir	47f8c1baa2	Migrate away from per-segment-per-threadlocals on SegmentReader (#11998 ) Add new stored fields and termvectors interfaces: IndexReader.storedFields() and IndexReader.termVectors(). Deprecate IndexReader.document() and IndexReader.getTermVector(). The new APIs do not rely upon ThreadLocal storage for each index segment, which can greatly reduce RAM requirements when there are many threads and/or segments. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-12-13 09:10:21 -05:00
Ignacio Vera	ef5766aa81	Fix algorithm that chooses the bridge between a polygon and a hole (#11988 )	2022-12-13 10:16:53 +01:00
Dawid Weiss	486003833f	Run spotless after javac (#12012 ) (#12015 )	2022-12-13 08:42:04 +01:00
Robert Muir	06f9179295	Enable LongDoubleConversion error-prone check (#12010 )	2022-12-12 20:55:39 -05:00
Greg Miller	e34234ca6c	Remove unnecessary NaN checks from LongRange#verifyAndEncode (#12008 )	2022-12-11 12:55:21 -08:00
Greg Miller	8671e29929	Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery (#12003 ) * Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).	2022-12-10 12:23:31 -08:00
gf2121	54e00df7f6	Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries (#12006 )	2022-12-11 02:30:17 +08:00
gf2121	9ff989ec00	Use ByteArrayComparator to replace Arrays#compareUnsigned in some other places (#11880 )	2022-12-08 23:51:08 +08:00
Alan Woodward	66127f6e69	Add support for stored fields to MemoryIndex (#11999 )	2022-12-08 09:56:24 +00:00
Adrien Grand	a971120d05	Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API. (#11964 ) `RandomAccessVectorValues` is internally used in our HNSW implementation to provide random access to vectors, both at index and search time. In order to better reflect this, this change does the following: - `RandomAccessVectorValues` moves to `org.apache.lucene.util.hnsw`. - `BufferingKnnVectorsWriter` no longer has a dependency on `RandomAccessVectorValues` and moves to `org.apache.lucene.codecs` since it's more of a utility class for KNN vector file formats than an index API. Maybe we should think of moving it near each file format that uses it instead. - `SortingCodecReader` no longer has a dependency on `RandomAccessVectorValues`. Closes #10623	2022-12-08 08:49:37 +01:00
Adrien Grand	95df7e8109	Generalize range query optimization on sorted indexes to descending sorts. (#11972 ) This generalizes #687 to indexes that are sorted in descending order. The main challenge with descending sorts is that they require being able to compute the last doc ID that matches a value, which would ideally require walking the BKD tree in reverse order, but the API only support moving forward. This is worked around by maintaining a stack of `PointTree` clones to perform the search.	2022-12-08 08:38:53 +01:00
Benjamin Trent	d0be9ab57c	GITHUB-11830 Better optimize storage for vector connections (#11860 )	2022-12-07 08:51:54 +01:00
Karl David Wright	108462a005	Followup work for #11883	2022-12-03 08:07:10 -05:00
Costin Leau	4eba6a1284	Add exponential growth to TimeLimitingBulkScorer (#11984 ) Increase the timeout check inside TimeLimitBulkScorer at exponential rate. Fix #11676	2022-12-02 09:20:48 -08:00
Dawid Weiss	1f741ff63c	Upgrade gradle to 7.6. (#11993 )	2022-12-02 09:18:38 +01:00
Robert Muir	fad3108b27	fix wrong serialization by ShapeDocValues (#11974 ) Closes #11973	2022-12-01 20:32:42 -05:00
Robert Muir	0a9bb6e2ac	Disable useless error-prone checks (libraries/frameworks we do not use) (#11971 ) These are easy/obvious ones to disable since we don't use the functionality at all: the checks are literally useless. This gives some performance boost to the error-prone, although it is still pretty slow. triage most of the previously disabled checks into TODO, noisy, etc	2022-12-01 08:46:23 -05:00
Alan Woodward	72ff140f5a	Don't let merged passages push out lower-scoring ones (#11990 ) PassageScorer uses a priority queue of size maxPassages to keep track of which highlighted passages are worth returning to the user. Once all passages have been collected, we go through and merge overlapping passages together, but this reduction in the number of passages is not compensated for by re-adding the highest-scoring passages that were pushed out of the queue by passages which have been merged away. This commit increases the size of the priority queue to try and account for overlapping passages that will subsequently be merged together.	2022-12-01 12:25:29 +00:00
Luca Cavanna	bd168ac2a8	Add changes entry for #11985	2022-11-30 10:13:39 +01:00
Luca Cavanna	343d888b30	ExitableTerms to override getMin and getMax (#11985 ) ExitableTerms should not iterate through the terms to retrieve min and max when the wrapped implementation has the values cached (e.g. FieldsReader, OrdsFieldReader)	2022-11-30 10:06:31 +01:00
Alan Woodward	0cc6f69536	Give OffsetsRetrievalStrategy implementations public constructors (#11983 ) OffsetsFromMatchIterator and OffsetsFromPositions both have package- private constructors, which makes them difficult to use as components in a separate highlighter implementation.	2022-11-28 16:22:46 +00:00
Karl David Wright	5c4896321d	Merge branch 'GITHUB-11883' into main Pulling in changes to address ticket 11883.	2022-11-25 16:32:02 -05:00
Karl David Wright	74e8b94796	Fix for 11883.	2022-11-25 16:17:18 -05:00
Karl David Wright	6dc6b5b0dd	As part of GITHUB-11883, develop new primitive Plane constructors to build boundary planes specific for each polygon edge.	2022-11-25 14:56:38 -05:00
Greg Miller	2e83c3b40f	Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type (#11950 )	2022-11-25 11:38:41 -08:00
Robert Muir	4e93f29318	fix bad shift amounts and enable check (#11979 )	2022-11-25 11:47:25 -05:00
Robert Muir	545c93a394	fix use of wrong array toString() method in test, enable check (#11978 )	2022-11-25 11:47:04 -05:00
Robert Muir	4885b5f856	fix use of wrong array equals() method in test, enable check (#11977 )	2022-11-25 11:46:48 -05:00
Robert Muir	f4286493d1	fix variable assigned to itself in test and enable check (#11980 )	2022-11-25 11:45:45 -05:00
Karl David Wright	b5f94b6754	Add test that tweaks identical planes in intersections bug	2022-11-25 07:40:45 -05:00
Karl David Wright	b5dd71198d	Refactor, restoring isWithinSection and making sure it is properly called.	2022-11-24 02:47:06 -05:00
Shubham Chaudhary	b15ace46b2	Remove QueryTimeout#isTimeoutEnabled method and move check to caller (#11954 ) Co-authored-by: Shubham <cshbha@amazon.com>	2022-11-24 16:37:20 +01:00
Adrien Grand	28576eb99d	Fix precommit.	2022-11-24 11:44:21 +01:00
Simon Cooper	135f3fab41	Ensure collections are properly sized on creation (#11942 ) A few other optimisations along the way	2022-11-24 11:20:04 +01:00
Karl David Wright	839dfb5a2d	More refactoring work, and fix a distance calculation.	2022-11-23 23:36:15 -05:00
Karl David Wright	5e4623af1f	For 11965, add structural changes that would allow intersection calls to also be O(log(n)). Disabled though because test failures are the result of enabling it - work ongoing.	2022-11-23 15:07:57 -05:00
Robert Muir	d3fe435be6	Invert error-prone configuration to be allow-list vs deny-list (#11970 ) This does not change the semantics or performance of our setup. Instead, it explicitly enables checks that we want vs disabling checks that we don't want. Also reordered checks to match the error-prone website list of checks for easier maintenance. It is now clear that many useless checks are enabled, we can disable some of them and try to get the performance reasonable.	2022-11-23 14:53:27 -05:00
Karl David Wright	482f8251ff	More work related to 11965: Improve performance of nearestDistance queries somewhat by removing unnecessary code.	2022-11-23 12:21:38 -05:00
Dawid Weiss	326142c485	Add a note about gradle checks being possibly a subset of all validation checks. (#11966 )	2022-11-23 12:19:28 +01:00
Adrien Grand	802774641a	Enforce VectorValues.cost() is equal to size(). (#11962 ) `VectorValues` have a `cost()` method that reports an approximate number of documents that have a vector, but also a `size()` method that reports the accurate number of vectors in the field. Since KNN vectors only support single-valued fields we should enforce that `cost()` returns the `size()`.	2022-11-23 11:05:00 +01:00
Adrien Grand	469547e909	No longer announce releases on general@l.a.o. (#11967 ) This mailing-list is deprecated.	2022-11-23 10:55:58 +01:00

1 2 3 4 5 ...

36401 Commits All Branches Search

36401 Commits

All Branches