Commit Graph

12029 Commits

Author SHA1 Message Date
Simon Willnauer 8480329213
LUCENE-9473: Ensure merges are stopped during abort merges (#1772)
We need to disable merges while we wait for running merges since
IW calls timed wait on it's lock that releases the monitor for the time
being which allows new merges to be registered unless we disable them.
2020-08-24 09:15:42 +02:00
David Smiley e1392c7440
LUCENE-9373: FunctionMatchQuery: add "matchCost" param 2020-08-24 00:07:55 -04:00
Erick Erickson c9c75810c2 Revert "LUCENE-9433: Remove Ant support from trunk"
This reverts commit 37cd17dc
2020-08-21 16:57:58 -04:00
Erick Erickson 37cd17dcf5 LUCENE-9433: Remove Ant support from trunk 2020-08-21 15:19:52 -04:00
Tomoko Uchida bbd21aa422 LUCENE-9448: Move README.txt to README.md; We no lonnger have txt format README on the master. 2020-08-21 10:20:37 +09:00
Simon Willnauer fa878eb5b8 Fix test to actually use the resource from the try/finally block 2020-08-20 08:46:29 +02:00
Simon Willnauer 5fcb859ece
Ensure we only rollback IW once (#1764)
Ensure we only rollback IW once

Today we might rollback IW more than once if we hit an exception during
the rollback code when we shutdown. This change moves the rollback code outside
the try block to ensure we always roll back but never roll back twice.
2020-08-20 08:40:25 +02:00
Simon Willnauer 70c72ff4b9 LUCENE-9467: Fix NRTCachingDirectory to use Directory#fileLength
to check if a file already exists instead of opening an IndexInput
on the file which might throw a AccessDeniedException in some Directory implementations.
2020-08-17 18:32:47 +02:00
Houston Putman 6fced2e1e1 Add back-compat indices for 8.6.1 2020-08-14 16:58:43 -04:00
Houston Putman 58a6f956d0 Add bugfix version 8.6.1 2020-08-14 16:35:46 -04:00
Simon Willnauer 4267734e80
Ensure DWPTPool never release any new DWPT after it's closed (#1751)
The DWPTPool should not release new DPWTs after it's closed. Yet, if the pool
is in a state where it's preventing new writers from being created in order to swap
the delete queue it might get closed and in that case we miss to throw an AlreadyClosedException
and release a new writer which violates the condition that the pool is empty after it's closed
and all remaining DWPTs have been aborted.
2020-08-14 16:04:58 +02:00
Dawid Weiss 150a8dacb5
LUCENE-9463: Query match region retrieval component, passage scoring and formatting (#1750)
Reviewed as part of previous issue by @romseygeek
2020-08-14 14:21:12 +02:00
Simon Willnauer a003f64649 Fix TestForTooMuchCloning to ensure it's MP is not reconfigured randomly 2020-08-14 12:14:31 +02:00
Dawid Weiss 6244383e0d
LUCENE-9462: Fields without positions should still return MatchIterator. (#1749) 2020-08-14 11:45:32 +02:00
Dawid Weiss 3579056249
Standalone distribution assembly and 'run' task for Luke (#1742)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2020-08-12 16:28:48 +02:00
David Smiley 97c9bb732a
LUCENE spell: Implement SuggestWord.toString (#1735) 2020-08-11 12:45:49 -04:00
Mike Drob 092076ec39
LUCENE-9453 Assert lock held before volatile write (#1734)
Found via IntelliJ warnings.
2020-08-11 10:41:57 -05:00
Dawid Weiss 5375a2d2ad
LUCENE-9454: upgrade hamcrest to version 2.2. (#1738) 2020-08-11 11:55:52 +02:00
Julie Tibshirani 688583fc2d LUCENE-9427: Fuzzy query should always call consumeTermsMatching in visitor 2020-08-06 12:05:04 +01:00
Adrien Grand 9b369abc17 LUCENE-9446: Move CHANGES entry from 9.0 to 8.7. 2020-08-04 17:13:49 +02:00
Julie Tibshirani b91a161283
LUCENE-9446: In boolean rewrite, remove MatchAllDocsQuery filter clauses (#1709)
Previously, we only removed 'match all' FILTER clauses if there was at least one
MUST clause. Now they're also removed if there is another distinct FILTER clause.

This lets boolean queries like `#field:value #*:*` be written to `#field:value`.
2020-08-04 17:08:10 +02:00
Mike McCandless cb457571e8 LUCENE-9440: call FieldInfo.checkConsistency for real (not under assert) 2020-07-30 14:59:55 -04:00
Mike McCandless d894a7e8d7 LUCENE-9395: ConstantValuesSource now shares a single DoubleValues instance across all segments 2020-07-30 08:53:44 -04:00
David Smiley 7d5b617973
LUCENE-9443: UnifiedHighlighter shouldn't close reader (#1706)
A regression from 8.6.  Don't close the underlying IndexReader.
2020-07-29 17:56:24 -04:00
Mike McCandless 327d860a00 LUCENE-9416: fix CheckIndex to print an invalid non-zero norm as unsigned long when detecting corruption 2020-07-29 10:04:33 -04:00
Mike McCandless e4c2be98fa LUCENE-9424: add a performance warning to AttributeSource.captureState javadocs 2020-07-27 11:11:35 -04:00
Zeno Gantner d0642600ff
LUCENE-9429 add missing semicolon (#1673) 2020-07-24 16:21:27 -05:00
Erick Erickson 67da34ac3b SOLR-14676: Update commons-collections to 4.4 and use it in Solr 2020-07-23 17:09:15 -04:00
Mike McCandless 03a03b34a4 LUCENE-9437: make DocValuesOrdinalsReader.decode public 2020-07-22 09:57:04 -04:00
Dawid Weiss 8ebf2d0b21 LUCENE-9312: Allow builds against arbitrary JVMs (squashed
jira/LUCENE-9312)
2020-07-21 09:19:38 +02:00
Dawid Weiss 8cf84a3725 Import Download task's plugin explicitly. 2020-07-20 10:57:13 +02:00
Bruno Roustant 522c146da5
Add back-compat indices for 8.6.0 2020-07-15 16:53:51 +02:00
Bruno Roustant efb936b787
Sync CHANGES for 8.6.0 2020-07-15 16:09:41 +02:00
Erick Erickson c346881ad6 SOLR-13939: https://issues.apache.org/jira/browse/SOLR-13939 2020-07-12 22:41:08 -04:00
Mike McCandless 60e0d8ac6e LUCENE-8574: the DoubleValues for dependent bindings for an expression are now cached and reused and no longer inefficiently recomputed per hit 2020-07-09 15:18:54 -04:00
Nhat Nguyen 20ec57a4fe
LUCENE-9423: Handle exc in NIOFSDirectory#openInput (#1658)
If we fail to get the size of a file in the constructor of 
NIOFSIndexInput, then we will leak a FileChannel opened in
NIOFSDirectory#openInput.
2020-07-09 09:14:05 -04:00
markharwood 887fe4c83d
LUCENE-9386 add case insensitive RegExp matching option (#1541)
Added case insensitive search option (currently only works with ASCII characters)
2020-07-08 16:08:12 +01:00
Ishan Chattopadhyaya df3bc4288c SOLR-14603: Upgrade Restlet to 2.4.3 2020-07-04 14:53:00 +05:30
Erick Erickson 76e1d901cb SOLR-14592: Upgrade Zookeeper to 3.6.1 2020-07-01 19:47:16 -04:00
Alan Woodward 1ec78ac394 LUCENE-9418: Add CHANGES entry 2020-06-30 09:33:59 +01:00
Alan Woodward 3ff331072a
LUCENE-9418: Fix ordered intervals over interleaved terms (#1618)
Given the input text 'A B A C', an ordered interval 'A B C' will currently return an incorrect
internal [2, 3] in addition to the correct [0, 3] interval. This is due to a bug in the ORDERED
algorithm, where we assume that after the first interval is returned, the sub-intervals are
always in-order. This assumption only holds during minimization, as minimizing an interval
may move the earlier terms beyond the trailing terms.

For example, after the initial [0, 3] interval is found above, the algorithm will attempt to
minimize it by advancing A to [2,2]. Because this is still before C at [3,3], but after B at
[1,1], we then try advancing B, leaving it at [Inf,Inf]. Minimization has failed, so we return
the original interval of [0,3]. However, when we come to retrieve the next interval, our
subintervals look like this: A[2,2], B[Inf,Inf], C[3,3] - the assumption that they are in order
is broken. The algorithm sees that A is before B, assumes that therefore all subsequent
subintervals are in order, and returns the new interval.

This commit fixes things by changing the assumption of ordering to only hold during
minimization. When first finding a candidate interval, the algorithm now checks that
all sub-intervals appear in order.
2020-06-30 09:13:55 +01:00
Ignacio Vera 0cef29f138
LUCENE-9417: Tessellator might fail when several holes share are connected to the same vertex (#1614) 2020-06-29 17:46:21 +02:00
Simon Willnauer 3377b09fcb LUCENE-8962: Ensure we never flush by ram buffer or doc count in test 2020-06-28 09:38:27 +02:00
Simon Willnauer fb3c5d2353 LUCENE-8962: Fix changes entry. This feature is added to 8.6 2020-06-27 22:32:01 +02:00
Simon Willnauer 7f352a9665
LUCENE-8962: Merge small segments on commit (#1617)
Add IndexWriter merge-on-commit feature to selectively merge small segments on commit,
subject to a configurable timeout, to improve search performance by reducing the number of small
segments for searching.

Co-authored-by: Michael Froh <msfroh@apache.org>
Co-authored-by: Michael Sokolov <sokolov@falutin.net>
Co-authored-by: Mike McCandless <mikemccand@apache.org>
2020-06-27 22:25:45 +02:00
Erick Erickson ed025741d7 LUCENE-9389: Enhance logging messages in Lucene's Luke module 2020-06-25 22:44:10 -04:00
Mike Drob fa44f822e3
LUCENE-6669 Fix repeated "the the"
Co-Authored-By: Rich Bowen <rbowen@apache.org>
2020-06-24 16:15:51 -05:00
Simon Willnauer f47de19c4e
LUCENE-9408: Ensure OneMerge#mergeFinished is only called once (#1590)
in the case of an exception it's possible that some OneMerge instances
will be closed multiple times. This commit ensures that mergeFinished is
really just called once instead of multiple times.
2020-06-24 21:28:49 +02:00
Erick Erickson 9c1772f094 LUCENE-9411: Fail complation on warnings, 9x gradle-only 2020-06-23 16:21:10 -04:00
Mayya Sharipova b0333ab5c8
LUCENE-9280: Collectors to skip noncompetitive documents (#1351)
Similar how scorers can update their iterators to skip non-competitive
documents, collectors and comparators should also provide and update
iterators that allow them to skip non-competive documents.
2020-06-23 16:04:58 -04:00
Tomas Fernandez Lobbe 4774c6f0c1
Include delegate in AssertingSimilarity toString (#1596) 2020-06-22 16:38:00 -07:00
Michael Sokolov 5d43e73c66 Revert "LUCENE-8962: add ability to selectively merge on commit (#1552)"
This reverts commit 972c84022e.
2020-06-22 17:35:49 -04:00
Adrien Grand 541fc984e9 LUCENE-9409: Disable TestAllFilesDetectTruncation temporarily. 2020-06-21 10:03:42 +02:00
Michael Sokolov 972c84022e
LUCENE-8962: add ability to selectively merge on commit (#1552)
Co-authored-by: Michael Froh <msfroh@apache.org>
Co-authored-by: Simon Willnauer <simonw@apache.org>
2020-06-18 16:56:29 -04:00
Simon Willnauer 56febf05c3
Replace DWPT.DocState with simple method parameters (#1594)
DWPT.DocState had some history value but today in a little bit more
cleaned up DWPT and IndexingChain there is little to no value in having
this class. It also requires explicit cleanup which is not not necessary
anymore.
2020-06-18 20:02:10 +02:00
Tomas Fernandez Lobbe 4db1e3895f
LUCENE-9402: Let MultiCollector handle minCompetitiveScore (#1567) 2020-06-18 10:19:49 -07:00
Simon Willnauer efcf75a546 remove debug code 2020-06-17 23:33:48 +02:00
Simon Willnauer 9524cc4233 LUCENE-9408: roll back only called once enforcement 2020-06-17 23:31:43 +02:00
Simon Willnauer 59efe22ac2
LUCENE-8962: Allow waiting for all merges in a merge spec (#1585)
This change adds infrastructure to allow straight forward waiting
on one or more merges or an entire merge specification. This is
a basis for LUCENE-8962.
2020-06-17 22:48:12 +02:00
Adrien Grand ea0ad3ec51 LUCENE-9359: Avoid test failures when the extra file is a dir. 2020-06-17 09:36:02 +02:00
Adrien Grand 87a3bef50f
LUCENE-9353: Move terms metadata to its own file. (#1473) 2020-06-16 15:05:28 +02:00
Simon Willnauer c083e5414e
Cleanup TermsHashPerField (#1573)
Several classes within the IndexWriter indexing chain haven't been touched for 
several years. Most of these classes expose their internals through public
members and are difficult to construct in tests since they depend on many other
classes. This change tries to clean up TermsHashPerField and adds a dedicated
standalone test for it to make it more accessible for other developers since
it's simpler to understand. There are also attempts to make documentation better
as a result of this refactoring.
2020-06-16 14:45:45 +02:00
Robert Muir a108f90869
LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput
Rather than copying from buffers, we can pass the buffers directly to the checksum with good performance in JDK9+
2020-06-16 06:36:57 -04:00
Robert Muir 4decd5aa9c
LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE
Increase the buffersize used for ChecksumIndexInput for better crc performance.
2020-06-16 06:24:46 -04:00
Adrien Grand 2b61b205fc
LUCENE-9396: Improve truncation detection for points. (#1557) 2020-06-16 12:04:41 +02:00
Ignacio Vera 75491ab381
LUCENE-9400: Tessellator might fail when several holes share the same vertex (#1562) 2020-06-16 09:00:36 +02:00
Simon Willnauer 47cffbcdd8
LUCENE-9405: Ensure IndexWriter only closes merge readers once. (#1580)
IndexWriter incorrectly calls closeMergeReaders twice when the
merged segment is 100% deleted ie. would produce a fully deleted
segment.
2020-06-15 22:02:15 +02:00
Michael Sokolov 26075fc1dc
LUCENE-9394: fix and suppress warnings (#1563)
* LUCENE-9394: fix and suppress warnings in lucene/*
* Change type of ValuesSource context from raw Map to Map<Object, Object>
2020-06-12 07:25:31 -04:00
Adrien Grand cf8f83cef9 LUCENE-9356: Disable test, some corruptions are still not detected as corruptions. 2020-06-12 09:16:54 +02:00
Adrien Grand 38adf09ca2 LUCENE-9356: Make FST throw the correct exception upon incorrect input type. 2020-06-12 09:16:54 +02:00
Patrick Zhai 2991acf8ff
LUCENE-9391: Upgrade HPPC to 0.8.2 (#1560)
* LUCENE-8574: Upgrade HPPC to 0.8.2 (Co-authored-by: Haoyu Zhai <haoyzhai@amazon.com>)
2020-06-12 07:36:43 +02:00
Bruno Roustant 75d25ad677
LUCENE-9397: UniformSplit supports encodable fields metadata. 2020-06-11 18:19:48 +02:00
Adrien Grand 36109ec362
LUCENE-9356: Add a test that verifies that Lucene catches bit flips. (#1569) 2020-06-11 18:09:09 +02:00
Mike McCandless 138cdd758a LUCENE-9392: make FacetsConfig.DELIM_CHAR public 2020-06-10 12:37:04 -04:00
Bruno Roustant 2fe713b348
Fix TestPhraseWildcardQuery.testExplain to make it less fragile. 2020-06-10 16:30:25 +02:00
Ignacio Vera 37a83675a7
LUCENE-9398: Always keep BKD index off-heap. BKD reader does not implement Accountable any more (#1558) 2020-06-10 08:13:12 +02:00
Adrien Grand 54c5dd7d6d
LUCENE-9148: Move the BKD index to its own file. (#1475) 2020-06-09 09:59:14 +02:00
David Smiley 89784ad7be
LUCENE-9383: benchmark module: Gradle conversion (#1550) 2020-06-05 17:57:53 -04:00
Tomas Fernandez Lobbe 62abdac116 LUCENE-9393: Fix CHANGES entry 2020-06-05 11:23:38 -07:00
Tomas Fernandez Lobbe e1a97a0f1e
LUCENE-9393: FunctionScoreQuery turns TOP_DOCS to COMPLETE in inner weights (#1553)
FunctionScoreQuery can't really use WAND algorithm even if TOP_DOCS score mode is requested. This commit makes the inner weight created use COMPLETE
2020-06-05 11:04:13 -07:00
Uwe Schindler 08a13ce589 Upgrade forbiddenapis to hotfix release 3.0.1 (allows upgrade to commons-io 2.7 in Solr) 2020-06-04 01:01:42 +02:00
noble 0c4836b25a Add 7.7.3 back compat test indexes 2020-06-03 15:30:55 -05:00
Mike Drob 58958c9531 LUCENE-9365 CHANGES.txt 2020-06-03 15:21:42 -05:00
Mike Drob 45611d0647
LUCENE-9365 FuzzyQuery false negative when prefix length == search term length (#1545)
Co-Authored-By: markharwood <markharwood@gmail.com>
2020-06-03 15:19:30 -05:00
Adrien Grand 22cb4d4d95 LUCENE-9359: Address test failures when the codec version gets modified. 2020-05-29 21:16:10 +02:00
Adrien Grand f908f2cd48 LUCENE-9359: Always call checkFooter in SegmentInfos#readCommit. (#1483) 2020-05-29 16:05:46 +02:00
Adrien Grand fe07d9dd0e Revert "LUCENE-9359: Always call checkFooter in SegmentInfos#readCommit. (#1483)"
This reverts commit bfb6bf9c9a.
2020-05-29 15:45:51 +02:00
Adrien Grand bfb6bf9c9a
LUCENE-9359: Always call checkFooter in SegmentInfos#readCommit. (#1483) 2020-05-29 14:59:36 +02:00
Mike Drob 18519f3eb8 Add back-compat indices for 8.5.2 2020-05-27 13:18:15 -05:00
Mike Drob a240f0ba3f Add bugfix version 8.5.2 2020-05-27 12:09:36 -05:00
Erick Erickson b576ef6c8c LUCENE-9380: Fix auxiliary class warnings in Lucene 2020-05-27 12:36:39 -04:00
Andrzej Bialecki 22044fcabb SOLR-14498: Upgrade to Caffeine 2.8.4, which fixes the cache poisoning issue. 2020-05-26 12:56:08 +02:00
Alan Woodward de2bad9039
LUCENE-9330: Make SortFields responsible for index sorting and serialization (#1440)
This commit adds a new class IndexSorter which handles how a sort should be applied
to documents in an index:

* how to serialize/deserialize sort info in the segment header
* how to sort documents within a segment
* how to sort documents from merging segments

SortField has a getIndexSorter() method, which will return null if the sort cannot be used
to sort an index (eg if it uses scores or other query-dependent values). This also requires a
new Codec as there is a change to the SegmentInfoFormat
2020-05-22 13:33:06 +01:00
David Smiley 3fba3daa95
SOLR-14461: Replace commons-fileupload with Jetty (#1490) 2020-05-22 00:34:48 -04:00
Erick Erickson 21b08d5cab LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search 2020-05-21 20:29:18 -04:00
markharwood 44fc5b989a
Lucene-9371: Allow external access to RegExp's parsed structure (#1521)
Made RegExp internal fields public final to allow external classes to render eg English explanations of pattern logic
2020-05-19 17:38:00 +01:00
Uwe Schindler 06df50e759
LUCENE-9321: Port markdown task to Gradle (#1477) 2020-05-17 14:46:26 +02:00
Dawid Weiss eebe40a2f5 LUCENE-9372: gradlew does not run on cygwin (Peter Barna via Dawid Weiss) 2020-05-16 10:58:23 +02:00
erick 88aff5d006 LUCENE-9232: Fix or suppress 13 resource leak precommit warnings in lucene/replicator 2020-05-15 17:51:11 -04:00
Ignacio Vera 98ef96ccbb
LUCENE-9288: poll_mirrors.py release script can handle HTTPS mirrors (#1520) 2020-05-15 16:33:23 +02:00