lucene

Commit Graph

Author	SHA1	Message	Date
Robert Muir	ece8ea715c	Fix ExitableDirectoryReader sampling constants to be power-of-2 (#11850 ) If it's performance sensitive enough that we should do sampling, then we should avoid integer division too.	2022-10-15 12:05:15 -04:00
Benjamin Trent	a7369d7f59	Remove cancellation check on every vector (#11843 ) We recently introduced support for kNN vectors to `ExitableDirectoryReader`. Previously, we checked for cancellation not only on sampled calls `advance`, but on every single call to `vectorValue`. This can cause significant overhead when a query scans many vector values (for example the case where you're doing an exact scan and computing a vector similarity for every matching document). This PR removes the cancellation checks on `vectorValue`, since having them on `advance` is already enough.	2022-10-13 09:29:33 -07:00
Marc D'Mello	3a608995a1	GITHUB-11761 (part 2): Fix unit tests to cleany work with new TierMergePolicy delete pct default (#11841 ) Co-authored-by: Marc D'Mello <dmellomd@amazon.com>	2022-10-13 15:18:50 +02:00
Robert Muir	83891d9a61	WrapperDownloader: add retries for network blips around connect(), too (#11846 ) Add retries for common issues such as connect timeout, etc. This won't solve the problem of read-timeouts happening around the actual transferTo, but it is an easy incremental improvement.	2022-10-13 07:21:34 -04:00
Robert Muir	5e26b36ac8	Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory (#11844 ) Closes #11842	2022-10-13 07:20:21 -04:00
Peter Gromov	ab50fe640b	[hunspell] fix TestPerformance measurement after millis->nanos conversion	2022-10-12 11:29:07 +02:00
Robert Muir	4c434b7089	make 'gradle coverage' print test coverage summaries. (#11837 ) Currently, this task is too silent and just writes HTML reports. It is a nice improvement to print the summary to the console. Before: ``` > Task :lucene:analysis:icu:jacocoTestReport Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html. ``` After: ``` > Task :lucene:analysis:icu:jacocoTestReport Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html. > Task :lucene:analysis:icu:jacocoLogTestCoverage Test Coverage: - Class Coverage: 100% - Method Coverage: 87.9% - Branch Coverage: 82.7% - Line Coverage: 92.8% - Instruction Coverage: 92.7% - Complexity Coverage: 78.8% ```	2022-10-05 21:46:20 -04:00
Marc D'Mello	d966adcb62	GITHUB-11761: Move minimum TieredMergePolicy delete percentage and change default value (#11831 ) Move minimum TieredMergePolicy delete percentage from 20% to 5% and change deletePctAllowed default to 20% Co-authored-by: Marc D'Mello <dmellomd@amazon.com>	2022-10-05 15:33:12 -07:00
Uwe Schindler	f54fddc89f	GH-11819: Exclude MR-JAR sourceSet and folders from Idea Sync (#11836 )	2022-10-04 11:49:39 +02:00
Alan Woodward	6bd8733fdb	No need to rewrite queries in unified highlighter (#11807 ) Since QueryVisitor added the ability to signal multi-term queries, the query rewrite call in UnifiedHighlighter has been essentially useless, and with more aggressive rewriting this is now causing bugs like #11490. We can safely remove this call. Fixes #11490	2022-10-03 10:15:40 +01:00
Uwe Schindler	df94e6c005	Clean up MR-JAR build, so we do not have hardcoded "19" everywhere in validation tasks (#11835 ) As long as soureSets are named "mainXX", with XX a feature version, we check everything automatically: - ECJ is disabled (we can't do a check without forking ECJ as a separate process using toolkit, we may support this later) - forbiddenapis (we disable checks for missing classes) - errorprone is disabled (errorprone does not work correctly at moment with forked compiler)	2022-10-02 20:41:46 +02:00
Uwe Schindler	e5a226ec7c	For now only use bundled signatures from minJavaVersion (#11834 ) # Conflicts: # gradle/validation/forbidden-apis.gradle	2022-10-02 17:54:11 +02:00
Uwe Schindler	aae293437f	Upgrade forbiddenapis to 3.4 (#11834 )	2022-10-02 16:42:36 +02:00
Uwe Schindler	7333f0329b	Fix typo in log message (we only support exactly Java 19)	2022-10-02 11:09:58 +02:00
Michael Sokolov	9c12bec4a4	DOAP changes for release 9.4.0	2022-09-30 18:03:35 -04:00
Greg Miller	44b4602776	TermInSetQuery optimization when all docs in a field match a term (#11828 )	2022-09-29 06:59:59 -07:00
Greg Miller	367cd2ea95	Associate correct PR with DrillSideway change in CHANGES	2022-09-29 05:48:29 -07:00
Greg Miller	d02ba3134f	DrillSideways optimizations (#11803 ) DrillSidewaysScorer now breaks up first- and second-phase matching and makes use of advance when possible over nextDoc.	2022-09-29 05:22:30 -07:00
Uwe Schindler	6f25c79db3	Update smoketester on main to optionally run with Java 19	2022-09-27 12:24:24 +02:00
Uwe Schindler	c2058d71a1	Let smoketester initialize local settings before running any checks (like Github CI or Jenkins) (#11826 )	2022-09-27 11:45:38 +02:00
Uwe Schindler	1f30800cb5	GH-11819: Fix the Eclipse part to support deveopment of the MR-JAR: (#11823 ) - by default, Lucene will only generate a config for Java 17 (or 11 in 9.x), without the MR-JAR sourceSets - if passed -Peclipse.javaVersion=19, it will include matching sourcesets and set compiler version to given version in classpath	2022-09-27 11:11:49 +02:00
Ignacio Vera	78b58b8e2e	Build SpatialVisitor once per index (#11825 ) Address a performance regression on polygon queries using LatLonPoint field.	2022-09-27 10:51:49 +02:00
Greg Miller	971ae01164	Fix tie-break bug in various Facets implementations (#11768 )	2022-09-26 15:05:57 -07:00
Greg Miller	734841d6c0	Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment. (#11738 )	2022-09-26 10:39:47 -07:00
Greg Miller	ac12cd9f17	FacetsCollector#collect is no longer final to allow extension (#11804 )	2022-09-26 10:15:31 -07:00
Uwe Schindler	d943b76215	GITHUB-912: Remove deprecated APIs; fix link	2022-09-26 18:36:09 +02:00
Uwe Schindler	3b9c728ab5	MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23) (#912 ) This uses Gradle's auto-provisioning to compile Java 19 classes and build a multi-release JAR from them. Please make sure to regenerate gradle.properties (delete it) or change "org.gradle.java.installations.auto-download" to "true"	2022-09-26 15:22:04 +02:00
Adrien Grand	432296d967	Fix codec name in index header for Lucene94FieldInfosFormat. (#11818 )	2022-09-26 14:56:30 +02:00
Dawid Weiss	6b82be5f11	Regenerate sources after dependency updates. (#11817 )	2022-09-25 18:09:30 +02:00
Dawid Weiss	5d121ce44c	Upgrade several build dependencies. (#11812 ) * Upgrade several build dependencies. * Update error prone rules (those are off but they do trigger warnings/ errors) * A few corrections I made before I turned off new warnings. Let's do nother issue to fix them.	2022-09-25 17:10:22 +02:00
Robert Muir	15f3743f02	Remove Operations.isFinite (#11813 ) This method is recursive: to avoid eating too much stack we apply a small limit. This means it can't really be used on any largish automata without hitting exception. But the benefit of knowing finite vs infinite in AutomatonTermsEnum is minor: let's not auto-compute this. FuzzyQuery still gets the finite optimization because its finite by definition. PrefixQuery is always infinite. Wildcard/Regex just assume infinite which is safe to do. Remove the auto-computation and the "trillean" Boolean parameter. If you dont know that your automaton is finite, pass false to CompiledAutomaton, it is safe. Move this method to AutomatonTestUtil so we can still use it in test asserts. Closes #11809	2022-09-24 10:51:04 -04:00
Dawid Weiss	54fba99cb1	Upgrade google java format and apply tidy (#11811 )	2022-09-24 15:40:27 +02:00
Dawid Weiss	8bdfa90ea9	Fix and simplify the test (#11734 ).	2022-09-24 12:51:01 +02:00
Alan Woodward	188a78d769	Don't try to highlight very long terms (#11808 ) The UnifiedHighlighter can throw exceptions when highlighting terms that are longer than the maximum size the DaciukMihovAutomatonBuilder accepts. Rather than throwing a confusing exception, we can instead filter out the long terms when building the MemoryIndexOffsetStrategy. Very long terms are likely to be junk input in any case.	2022-09-24 11:26:16 +01:00
Luke Kot-Zaniewski	3a04aa44c2	Fix repeating token sentence boundary bug (#11734 ) Signed-off-by: lkotzaniewsk <lkotzaniewsk@bloomberg.net> Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>	2022-09-23 12:59:46 +02:00
jianping weng	5b24a233bd	LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search (#687 )	2022-09-22 08:51:13 +02:00
Shai Erera	bcc116057d	Minor refactoring and cleanup to taxonomy index code (#11775 )	2022-09-21 13:08:33 +03:00
Julie Tibshirani	add309bb40	Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix	2022-09-20 15:48:56 -07:00
Luca Cavanna	4eaebee686	Guard FieldExistsQuery against null pointers (#11794 ) FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates #11393	2022-09-20 15:38:38 +02:00
Adrien Grand	6c46662b43	Fix handling of ghost fields in string sorts. (#11792 ) Introduction of dynamic pruning for string sorts (#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is not null if the field is indexed according to field infos. This commit fixes the issue and adds tests for ghost fields across all sort types. Hopefully we can simplify and remove the null check in the future when we improve handling of ghost fields (#11393).	2022-09-20 13:49:52 +02:00
Jan Høydahl	00a8112d97	LUCENE-10365 Wizard changes contributed from Solr (#591 )	2022-09-20 12:07:42 +02:00
Alex	26d6063ec3	GitHub Workflows security hardening (#11789 )	2022-09-20 11:28:07 +02:00
Ignacio Vera	ecb0ba542b	Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (#11786 )	2022-09-20 07:15:38 +02:00
Michael Sokolov	accc3bdcfa	update DOAP and releaseWizard to reflect migration to github (#11747 )	2022-09-19 13:53:26 -04:00
Michael Sokolov	07af358f90	Diversity check bugfix (#11781 ) * Fixes bug in HNSW diversity checks introduced in LUCENE-10577	2022-09-19 11:48:59 -04:00
Michael Sokolov	e69c48b8d9	Fix rare bug in TestKnnVectorQuery when we have multiple segments	2022-09-18 20:21:39 +00:00
Namgyu Kim	451bab300e	GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (#11779 )	2022-09-17 00:42:35 +09:00
Adrien Grand	155876a902	LUCENE-10674: Move changes entry to 9.4.	2022-09-16 16:59:42 +02:00
Dawid Weiss	9acc653995	GH-11172: remove WindowsDirectory and native subproject. (#11774 )	2022-09-15 16:22:46 +02:00
John Mazanec	0587844742	LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (#1068 ) Signed-off-by: John Mazanec <jmazane@amazon.com>	2022-09-15 11:21:39 +02:00

1 2 3 4 5 ...

36201 Commits All Branches Search

36201 Commits

All Branches