36502 Commits

Author SHA1 Message Date
Luca Cavanna
082c49a9ef
Update javadocs for QueryTimeout (#12272)
QueryTimeout was introduced together with ExitableDirectoryReader but is
now also optionally set to the IndexSearcher to wrap the bulk scorer
with a TimeLimitingBulkScorer. Its javadocs needs updating.
2023-05-09 11:27:47 +02:00
Luca Cavanna
10bad40ed3
Make query timeout members final in ExitableDirectoryReader (#12274)
There's a couple of places in the Exitable wrapper classes where
queryTimeout is set within the constructor and never modified. This
commit makes such members final.
2023-05-09 11:27:06 +02:00
Luca Cavanna
1cd9c1d66a add missing changelog entry for #12220 2023-05-09 10:57:28 +02:00
Luca Cavanna
67bb384f72 add missing changelog entry for #12260 2023-05-09 10:52:03 +02:00
Luca Cavanna
9579d2de76 Move changes entry for #12270 to 9.7.0 section 2023-05-09 10:28:22 +02:00
Armin Braun
add9aba16d
Don't generate stacktrace in CollectionTerminatedException (#12270)
CollectionTerminatedException is always caught and never exposed to users so there's no point in filling
in a stack-trace for it.
2023-05-09 10:18:52 +02:00
Jonathan Ellis
9a7efe92c0
allocate one NeighborQueue per search for results (#12255) 2023-05-08 17:22:58 -04:00
Michael Sokolov
a39885fdab
GITHUB-12224: remove KnnGraphTester (moved to luceneutil) (#12238) 2023-05-08 10:12:36 -04:00
Uwe Schindler
397c2e547a
Fix MMapDirectory documentation for Java 20 (#12265) 2023-05-05 12:04:38 +02:00
Luca Cavanna
caeabf3930
Fix SynonymQuery equals implementation (#12260)
The term member of TermAndBoost used to be a Term instance and became a
BytesRef with #11941, which means its equals impl won't take the field
name into account. The SynonymQuery equals impl needs to be updated
accordingly to take the field into account as well, otherwise synonym
queries with same term and boost across different fields are equal which
is a bug.
2023-05-03 11:27:33 +02:00
Jonathan Ellis
3c163745bb Use HashMap (was TreeMap) for OnHeapHnswGraph neighbors 2023-04-30 17:59:39 -04:00
Patrick Zhai
1fa2be90ea Tidy the main branch 2023-04-26 21:21:57 -07:00
Alan Woodward
7374c200a1 Add next minor version 9.7.0 2023-04-26 16:44:47 +01:00
Christoph Büscher
f45e096304
Add ordering of files in compound files (#12241)
Today there is no specific ordering of how files are written to a compound file.
The current order is determined by iterating over the set of file names in
SegmentInfo, which is undefined. This commit changes to an order based
on file size. Colocating data from files that are smaller (typically metadata
files like terms index, field info etc...) but accessed often can help when
parts of these files are held in cache.
2023-04-26 14:01:02 +01:00
Luca Cavanna
b0befef912
QueryProfilerWeight to extend FilterWeight (#12242)
QueryProfilerWeight should override matches and delegate to the
subQueryWeight. Another way to fix this issue is to make it extend
ProfileWeight and override only methods that need to have a different
behaviour than delegating to the sub weight.
2023-04-26 10:24:57 +02:00
Alessandro Benedetti
4deb0003c4 Word2VecSynonymFilter constructor null check (#12169) 2023-04-24 17:28:12 +02:00
Daniele Antuzi
1f4f2bf509
Introduced the Word2VecSynonymFilter (#12169)
Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>
2023-04-24 13:35:26 +02:00
Peter Gromov
5e0761eab5 remove timeout dependency from TestHunspell.testSuggestionOrderStabilityOnDictionaryEditing 2023-04-23 21:16:56 +02:00
Peter Gromov
025dfec2dd
Hunspell: reduce suggestion set dependency on the hash table order (#12239)
* Hunspell: reduce suggestion set dependency on the hash table order

When adding words to a dictionary, suggestions for other words shouldn't change unless they're directly related to the added words.
But before, GeneratingSuggester selected 100 best first matches from the hash table, whose order can change significantly after adding any unrelated word.
That resulted in unexpected suggestion changes on seemingly unrelated dictionary edits.
2023-04-23 16:51:17 +02:00
Stefan Vodita
2e7426961b
Remove statement that SSDV facets aren't hierarchical (#12232) 2023-04-21 18:40:08 -04:00
Peter Gromov
60c9039d9f mention "GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end" CHANGES.txt 2023-04-21 15:17:33 +02:00
Usman Shaikh
bed07c6b02
Update Javadoc comment to mention gradle instead of ant (#12201) 2023-04-18 22:14:19 -07:00
Houston Putman
08f30f82b4
Cleanup NOTICE.txt (#12227)
- Ant is no longer used as the build system for Lucene
- JUnit is not packaged in a Lucene release
- The Float16Converter was removed before the PR it was used in was merged:
  https://github.com/apache/lucene-solr/pull/2108
2023-04-18 15:58:09 -04:00
Kartik Ganesh
3813f5ab7c
Change the access modifier for the "expert" readLatestCommit API to public. (#12229)
This change also includes a unit test for this functionality.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
2023-04-18 14:38:35 -04:00
Andrey Bozhko
2d0dc6407a
Avoid redundant copies of BytesRef when constructing new Term (#12234) 2023-04-15 22:44:14 -07:00
Vigya Sharma
4e88118a35
Fix typo in CheckJoinIndex (#12231) 2023-04-14 14:06:19 -07:00
Marcus
2d7908e3c9
Explain term automaton queries (#12208) 2023-04-08 16:09:42 -07:00
Patrick Zhai
c31017589b
Remove a test in TestDocumentsWriterDeleteQueue (#12223) 2023-04-04 10:49:14 -07:00
Peter Gromov
56aef7265a
hunspell: disallow hidden title-case entries from compound middle/end (#12220)
if we only have custom-case uART and capitalized UART, we shouldn't accept StandUart as a compound (although we keep hidden "Uart" dictionary entries for internal purposes)
2023-04-03 20:06:58 +02:00
Adrien Grand
56e65919b1
Adjust DWPT pool concurrency to the number of cores. (#12216)
After upgrading Elasticsearch to a recent Lucene snapshot, we observed a few
indexing slowdowns when indexing with low numbers of cores. This appears to be
due to the fact that we lost too much of the bias towards larger DWPTs in
apache/lucene#12199. This change tries to add back more ordering by adjusting
the concurrency of `DWPTPool` to the number of cores that are available on the
local node.
2023-03-31 15:07:48 +02:00
Greg Miller
172dfaf867 changes entry for GH#12212 2023-03-29 11:09:22 -07:00
Frederic Thevenet
df1b0baa69
Fixes Searches made via DrillSideways may miss documents that should match the query (#12212) 2023-03-29 11:05:58 -07:00
Uwe Schindler
b84b360f58
Upgrade forbiddenapis to version 3.5 (#12215)
Upgrade forbiddenapis to version 3.5.  This tones down some verbose warnings printed while checking Java 19 and Java 20 sourcesets for the MR-JAR
2023-03-27 13:30:22 +02:00
Hongyu Yan
a6475cecbf
Fix ordered intervals query over interleaved terms (#12214)
Given an input text 'A B A C A B C' and search ORDERED(A, B, C), we should 
retrieve hits [0,3] and [4,6]; currently [4,6] is skipped.

After finding the first interval [0, 3], the subintervals will become A[0,0], B[1,1], 
C[3,3]; then the algorithm will try to minimize it and the subintervals will 
become: A:[2,2], B:[5,5], C:[3,3] (after finding 5 > 3 it breaks the minimization)

And when finding next interval, it will do advance(B) before checking whether 
it is after A(the do-while loop), so subintervals will become A[2,2], B[inf, inf], 
C[3,3] and return NO_MORE_INTERVAL.

This commit instead continues advancing subintervals from where the last
`nextInterval` call stopped, rather than always advancing all subintervals.
2023-03-27 09:18:33 +01:00
Adrien Grand
0782535017
Fully reuse postings enums when flushing sorted indexes. (#12206)
Currently we're only half reusing postings enums when flushing sorted indexes
as we still create new wrapper instances every time, which can be costly with
fields that have many terms.
2023-03-16 13:51:33 +01:00
Patrick Zhai
d3b6ef3c86
Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class (#12126) 2023-03-15 20:51:49 -07:00
Adrien Grand
f324204019
Reduce contention in DocumentsWriterPerThreadPool. (#12199)
Obtaining a DWPT and putting it back into the pool is subject to contention.
This change reduces contention by using 8 sub pools that are tried sequentially.
When applied on top of #12198, this reduces the time to index geonames with 20
threads from ~19s to ~16-17s.
2023-03-15 13:17:40 +01:00
Adrien Grand
805eb0b613
Use radix sort to sort postings when index sorting is enabled. (#12114)
This switches to LSBRadixSorter instead of TimSorter to sort postings whose
index options are `DOCS`. On a synthetic benchmark this yielded barely any
difference in the case when the index order is the same as the sort order, or
reverse, but almost a 3x speedup for writing postings in the case when the
index order is mostly random.
2023-03-15 11:56:45 +01:00
Adrien Grand
d407edf4b8
Reduce contention in DocumentsWriterFlushControl. (#12198)
lucene-util's `IndexGeoNames` benchmark is heavily contended when running with
many indexing threads, 20 in my case. The main offender is
`DocumentsWriterFlushControl#doAfterDocument`, which runs after every index
operation to update doc and RAM accounting.

This change reduces contention by only updating RAM accounting if the amount of
RAM consumption that has not been committed yet by a single DWPT is at least
0.1% of the total RAM buffer size. This effectively batches updates to RAM
accounting, similarly to what happens when using `IndexWriter#addDocuments` to
index multiple documents at once. Since updates to RAM accounting may be
batched, `FlushPolicy` can no longer distinguish between inserts, updates and
deletes, so all 3 methods got merged into a single one.

With this change, `IndexGeoNames` goes from ~22s to ~19s and the main offender
for contention is now `DocumentsWriterPerThreadPool#getAndLock`.

Co-authored-by: Simon Willnauer <simonw@apache.org>
2023-03-15 11:39:40 +01:00
Lu Xugang
62e175bf4f
Unrelated code in TestIndexSortSortedNumericDocValuesRangeQuery (#12153) 2023-03-15 15:22:32 +08:00
Marcus
dfd9e0fe97
Remove the Now Unused Class pointInPolygon. (#12159)
Removes the unused Tessellator.pointInPolygon method.
2023-03-14 09:16:02 -05:00
Jasir KT
4851bd74f4
Fix boost missing in MultiFieldQueryParser (#12202)
When using boost along with any of fuzzy, wildcard, regexp, range or prefix queries, the boost is not applied.
2023-03-13 08:27:42 -04:00
Uwe Schindler
e4d8a5c5cb
Implement MMapDirectory with Java 20 Project Panama Preview API (#12188) 2023-03-09 21:27:31 +01:00
Jasir KT
96efb34d00
Fix Slop Issue in MultiFieldQueryParser (#12196)
In Lucene 5.4.0 62313b83ba9c69379e1f84dffc881a361713ce9 introduced some changes for immutability of queries. setBoost() function was replaced with new BoostQuery(), but BoostQuery is not handled in setSlop function. This commit adds the handling of BoostQuery in setSlop() function.
2023-03-08 11:16:09 -05:00
Greg Miller
0651d25713
Fixup TestLongValueFacetCounts after GITHUB#11744. (#12192)
GH#11744 deprecated LongValueFacetCounts#getTopChildrenSortByCount in favor
of the standard Facets#getTopChildren. The issue is that #getTopChildrenSortByCount
didn't do any input validation and allowed for topN == 0, while #getTopChildren
does input validation. Randomized testing could produce topN values of 0, which
resulted in falied tests. This addresses the tests.
2023-03-06 18:13:03 -08:00
Greg Miller
afd3a7efbe
Remove LongValueFacetCounts#getTopChildrenSortByCount since it provides redundant functionality (#11744) 2023-03-06 12:12:23 -08:00
Tyler Bertrand
c514089d66
Gradle optimizations (#12150)
* Define inputs and outputs for task validateJarLicenses
  * Lazily configure validateJarLicenses
* Move functionality from copyTestResources task into processTestResources task
  * Lazily configure processTestResources
  * Altered TestCustomAnalyzer.testStopWordsFromFile() to find resources in updated location
* Resolve "overlapping output" issue preventing processTestResources from being cached
* Provide system properties from CommandLineArgumentProviders
  * Configure certain system properties as inputs to take advantage of UP-TO-DATE checking
  * Applies the correct pathing strategies to take full advantage of caching even if builds are executed from different locations on disk
* Make validateSourcePatterns task cacheable by removing .gradle directory from its input
2023-03-06 19:17:37 +01:00
Greg Miller
b4f969c197
Better PostingsEnum reuse in MultiTermQueryConstantScoreBlendedWrapper (#12179) 2023-03-06 09:09:52 -08:00
Christine Poerschke
3bd06b1cb9
GITHUB-12181: fix false-positive TestKnnFloatVectorQuery.testDocAndScoreQueryBasics() failure (#12182) 2023-03-06 15:29:36 +00:00
Kaival Parikh
e0d92eef98
Concurrent rewrite for KnnVectorQuery (#12160)
- Reduce overhead of non-concurrent search by preserving original execution
- Improve readability by factoring into separate functions

---------

Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com>
2023-03-04 01:12:11 -08:00