Commit Graph

1617 Commits

Author SHA1 Message Date
Koen De Groote 67104dd615 LUCENE-8847: Code Cleanup: Rewrite StringBuilder.append with concatted strings (#707)
This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation.
This defeats the purpose of using StringBuilder and also introduces an extra alloction.
These changes should avoid that.

ant tests have run, succeeded on local machine.

Removing test files from the changes.

Another suggested rework.
2019-06-10 18:07:43 +02:00
Atri Sharma 965fd194d1 LUCENE-8825: Improve CheckHits's Printing Capabilities
Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-06-07 18:47:41 +02:00
Atri Sharma 87e936f1bb LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm
The current slicing algorithm assigns a thread per segment, which
can be detrimental to performance in case the distribution has
a large number of small segments. The patch introduces a slicing
algorithm which coalesces smaller segments to a single thread,
thus reducing the impact of context switching by limiting the
number of threads

Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-05-21 20:18:42 +02:00
Namgyu Kim 5a694ea26f LUCENE-8805: Parameter changes for stringField() in StoredFieldVisitor
Signed-off-by: Namgyu Kim <kng0828@gmail.com>
Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-05-21 20:18:42 +02:00
Dawid Weiss 5c9e7d5351 LUCENE-8781: FST lookup performance has been improved in many cases by encoding Arcs using full-sized arrays with gaps. The new encoding is enabled for postings in the default codec and for suggesters. (Mike Sokolov) 2019-05-06 11:19:35 +02:00
Nicholas Knize faa78ad72c LUCENE-8736: Fix Polygon.contains to approriately check longitude range, and pass correct line segment vertices in EdgeTree 2019-04-18 13:15:07 -05:00
Uwe Schindler faaee86efb LUCENE-8738: Move to Java 11 as minimum Java version (merged branch: jira/LUCENE-8738)
Co-authored-by: Adrien Grand <jpountz@apache.org>
2019-04-16 14:00:09 +02:00
Simon Willnauer a302be381e
LUCENE-8671: Introduce Reader attributes (#640)
Reader attributes allows a per IndexReader configuration of codec internals.
For instance this allows a per reader configuration if FSTs are loaded into memory or are left
on disk.
2019-04-15 20:39:36 +02:00
Nicholas Knize 55c241d87f LUCENE-8736: Fix LatLonShapePolygonQuery and Polygon2D.contains to correctly include points that fall on the boundary 2019-04-11 09:27:36 -05:00
Simon Willnauer a9503d2e81
LUCENE-8754: Prevent ConcurrentModificationException in SegmentInfo (#637)
In order to prevent ConcurrentModificationException this change makes
an unmodifiable copy on write for all maps in SegmentInfo. MergePolicies
can access these maps without synchronization and cause exceptions if
it's modified in the merge thread.
2019-04-10 09:29:22 +02:00
Simon Willnauer 1ec229b604
LUCENE-8671: Expose FST off/on-heap options on Lucene50PostingsFormat (#613)
Before we can expose options to configure this postings format
on a per-reader basis we need to expose the option to load the terms
index FST off or on heap on the postings format. This already allows to
change the default in a per-field posting format if an expert user
wants to change the defaults. This essentially provides the ability to change
defaults globally while still involving some glue code.
2019-04-04 16:59:37 +02:00
Henning Andersen 04afdb6442 LUCENE-8735: Avoid FileAlreadyExistsException on windows. (#619)
FilterDirectory.getPendingDeletions() did not delegate the call, which
resulted in a new IndexWriter on same directory not considering pending
delete files. This could in turn result in a FileAlreadyExistsException
when running windows.
2019-03-26 14:56:45 +01:00
Simon Willnauer 14175c46d2
LUCENE-8671: Load FST off-heap if reader is not opened from an index writer (#610)
Today we never load FSTs of ID-like fields off-heap since we need
very fast access for updates. Yet, a reader that is not loaded from
an IndexWriter can also leave the FST on disk. This change adds
this information to SegmentReadState to allow the postings format
to make this decision without configuration.
2019-03-20 11:28:10 +01:00
Adrien Grand 577bef53dd LUCENE-8166: Require merge instances to be consumed in the thread that created them. 2019-03-19 10:51:54 +01:00
Simon Willnauer ad457d188e Improve RIW exception handling and opt out of concurrent flushing if exception is expected 2019-03-15 11:00:16 +01:00
Adrien Grand 425f207f40 LUCENE-8688: Forced merges merge more than necessary. 2019-03-15 10:27:27 +01:00
Alan Woodward fbd05167f4
LUCENE-3041: QueryVisitor (#581)
This commit adds an introspection API to Query, allowing users to traverse
the nested structure of a query and examine its leaves.  It replaces the existing
`extractTerms` method on Weight, and alters some highlighting code to use
the new API
2019-03-14 15:04:33 +00:00
Simon Willnauer ffb1fc83de
Concurrently flush next buffer during commit in RandomIndexWriter (#607)
This is a spinn-off from `LUCENE-8700` that is satisfied by IndexWriter#flushNextBuffer.
The idea here is to additionally call flushNextBuffer in RandomIndexWriter for better
test coverage. This is a test-only change.
2019-03-14 15:43:35 +01:00
Alan Woodward 7ad0ac0191 LUCENE-8714: Don't use NoMergePolicy in norms tests
This can cause spurious failures when run in conjunction with HandleLimitFS,
as we can end up with lots of very small segments which trips the file handles
limit
2019-03-01 14:47:54 +00:00
Simon Willnauer 4a513fa99f
LUCENE-8292: Make TermsEnum fully abstract (#574) 2019-02-15 17:32:55 +01:00
yyuan2 a3a4ecd80b LUCENE-8662: Change TermsEnum.seekExact(BytesRef) to abstract 2019-02-08 15:10:38 -08:00
iverase 5d1d6448b9 LUCENE-8673: Use radix partitioning when merging dimensional points instead of sorting all dimensions before hand. 2019-02-07 08:12:13 +01:00
Dawid Weiss d7dc53ff7c LUCENE-8474: Remove deprecated RAMDirectory. 2019-01-28 13:49:03 +01:00
Toke Eskildsen c13645bd4c LUCENE-8585: Create jump-tables for DocValues at index-time 2019-01-18 22:42:04 +01:00
Dawid Weiss efef89adc6 LUCENE-8642: RamUsageTester.sizeOf ignores arrays and collections if --illegal-access=deny. 2019-01-18 11:55:53 +01:00
Dawid Weiss f2352e9456 Revert "LUCENE-8642, LUCENE-8641: correct RamUsageTester.sizeOf's handling of ByteBuffers. Throw exceptions on denied reflection to catch problems early. This affects tests only."
This reverts commit a16f0833ed.
2019-01-17 13:05:36 +01:00
Dawid Weiss a16f0833ed LUCENE-8642, LUCENE-8641: correct RamUsageTester.sizeOf's handling of ByteBuffers. Throw exceptions on denied reflection to catch problems early. This affects tests only. 2019-01-17 12:23:30 +01:00
Dawid Weiss d4e016afdf LUCENE-8474: (partial) removal of accesses to RAMFile and RAMDirectory streams. Removal of GrowableByteArrayDataOutput. 2019-01-15 13:42:25 +01:00
Steve Rowe 283b19a8da LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji tokenization with the '<EMOJI>' token type. 2019-01-08 13:33:49 -05:00
Dawid Weiss f28c5bec9b LUCENE-8604: TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of logs per test class. 2018-12-18 22:03:44 +01:00
Dawid Weiss e916f1fb86 LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core dependency. 2018-12-15 09:49:36 +01:00
Simon Willnauer e974311d91 LUCENE-8609: Allow getting consistent docstats from IndexWriter
Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
to get all stats for the current index but it's subject to concurrency
and might return numbers that are not consistent ie. some cases can
return maxDoc < numDocs which is undesirable. This change adds a getDocStats()
method to index writer to allow fetching consistent numbers for these stats.

This change also deprecates IndexWriter#numDocs() and IndexWriter#maxDoc()
and replaces all their usages wiht IndexWriter#getDocStats()
2018-12-14 19:36:25 +01:00
Alan Woodward f5867a1413 LUCENE-8564: Add GraphTokenFilter 2018-12-04 09:47:42 +00:00
Michael Sokolov 6728f0c4f4 update comment after limiting number of debug tokens 2018-11-27 06:00:29 -05:00
Michael Sokolov 34ed01543a fixing javadoc; added docs for parameters of new method 2018-11-27 06:00:29 -05:00
Michael Sokolov 54907903e8 LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestRandomChains 2018-11-27 06:00:29 -05:00
Dawid Weiss bd3ce916bd LUCENE-8568: TestMockDirectoryWrapper/ RAMInputStream NPE. 2018-11-20 13:37:29 +01:00
Erick Erickson 763e64260f SOLR-12881: Remove unneeded import statements 2018-11-14 17:48:15 -08:00
Dawid Weiss 4e2481b04b LUCENE-8560: TestByteBuffersDirectory.testSeekPastEOF() failures with ByteArrayIndexInput. ByteArrayIndexInput removed entirely, without a replacement. 2018-11-10 16:54:28 +01:00
David Smiley d0cd4245bd LUCENE-8557: LeafReader.getFieldInfos should always return the same instance
MemoryIndex: compute/cache up-front
Solr Collapse/Expand with top_fc: compute/cache up-front
Json Facets numerics / hash DV: use the cached fieldInfos on SolrIndexSearcher
SolrIndexSearcher: move the cached FieldInfos to SlowCompositeReaderWrapper
2018-11-06 14:45:32 -05:00
David Smiley fd9164801e LUCENE-7875: Moved MultiFields static methods to MultiTerms, FieldInfos and MultiBits.
MultiBits is now public and has getLiveDocs.
2018-10-18 19:49:14 -04:00
Christine Poerschke 1ccd555862 Fix couple of typos. 2018-10-15 15:08:17 -04:00
Nicholas Knize 1118299c33 LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users to select a fewer number of dimensions to be used for creating the index than the total number of dimensions used for field encoding. i.e., dimensions 0 to N may be used to determine how to split the inner nodes, and dimensions N+1 to D are ignored and stored as data dimensions at the leaves. 2018-10-08 18:51:03 -05:00
David Smiley fe844c739b LUCENE-8513: Remove MultiFields.getFields
SlowCompositeReaderWrapper now works with MultiTerms directly
2018-10-01 10:39:12 -04:00
Alan Woodward c696cafc0d LUCENE-8352: Make TokenStreamComponents final 2018-09-19 10:02:56 +01:00
Adrien Grand a9acdfdb54 LUCENE-8340: Recency boosting. 2018-09-04 14:03:24 +02:00
Alan Woodward 910a0231f6 LUCENE-6228: Add Scorable class and make LeafCollector.setScorer() take Scorable 2018-09-04 11:01:44 +01:00
Dawid Weiss 54f2565038 LUCENE-8469: Inline calls to the deprecated StringHelper.compare, removed StringHelper.compare from master. 2018-08-30 09:59:51 +02:00
Dawid Weiss ce504f4f81 LUCENE-8468: add ByteBuffersDirectory to randomized Directory implementations in LuceneTestCase (master branch only). 2018-08-29 10:43:00 +02:00
Adrien Grand 025350ea12 LUCENE-8461: Add Lucene80Codec. 2018-08-23 10:51:45 +02:00
Jim Ferenczi 49e3cca77f LUCENE-8204: Boolean queries with a mix of required and optional clauses are now faster if the total hit count is not required 2018-08-08 15:49:58 +02:00
Adrien Grand e56c8722ce Revert "Make the nightly test smaller so that it does not fail with GC overhead exceeded (OOM). Clean up random number fetching to make it shorter."
This reverts commit 3203e99d8f.
2018-08-01 15:44:57 +02:00
Adrien Grand 86a39fa29f Revert "Fix AAIOOBE in GeoTestUtil."
This reverts commit c3e813188e.
2018-08-01 15:44:47 +02:00
Adrien Grand c3e813188e Fix AAIOOBE in GeoTestUtil. 2018-08-01 15:17:53 +02:00
Dawid Weiss 3203e99d8f Make the nightly test smaller so that it does not fail with GC overhead exceeded (OOM). Clean up random number fetching to make it shorter. 2018-08-01 14:05:02 +02:00
Adrien Grand 99dbe93681 LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute total hit counts accurately up to 1,000. 2018-08-01 09:01:21 +02:00
Steve Rowe a08eadb480 Fix InfixSuggestersTest.testShutdownDuringBuild() failures 2018-07-30 22:49:49 -04:00
Adrien Grand 61e89e3ca0 LUCENE-8431: Top-docs collectors now collect lower bounds of the hit count. 2018-07-30 16:38:05 +02:00
Adrien Grand 9ca053712a LUCENE-8430: TopDocs.totalHits may now be a lower bound of the hit count. 2018-07-30 16:38:05 +02:00
Dawid Weiss d25f62634b LUCENE-8415: test quirk follow up. MockDirectoryWriter uses AccessDeniedException (a subclass of IOException) to signal files still open for writing when read access is requested. 2018-07-25 11:34:31 +02:00
Dawid Weiss 8892c0d9af LUCENE-8415: Clean up Directory contracts (write-once, no reads-before-write-completed). Minor test improvements and cleanups. 2018-07-24 08:47:50 +02:00
Jason Gerlowski 6ed9607f74 SOLR-12555: Add add'l expectThrows() test helper 2018-07-23 20:37:04 -04:00
Alan Woodward 028c86b1fa LUCENE-8306: Allow iteration over submatches
Also includes LUCENE-8404, adding match iteration to SpanQuery
2018-07-23 10:02:01 +01:00
Alan Woodward 6e3f61f6f9 Revert "LUCENE-8306: Allow iteration over submatches"
Incorrect patch committed in error

This reverts commit a8839b7eab.
2018-07-22 22:36:46 +01:00
Alan Woodward a8839b7eab LUCENE-8306: Allow iteration over submatches 2018-07-22 21:42:46 +01:00
Adrien Grand 331ccf3910 LUCENE-8405: Remove TopDocs.maxScore. 2018-07-18 08:38:57 +02:00
Adrien Grand 8093c450c1 LUCENE-8263: Replace TieredMergePolicy's reclaimDeletesWeight with deletesPctAllowed. 2018-07-17 18:31:06 +02:00
Adrien Grand d730c8b214 LUCENE-8060: Remove usage of TopDocs#totalHits that should really be IndexSearcher#count.
Many tests were written before we introduced IndexSearcher#count and used
`searcher.search(query, 1).totalHits` to get the number of matches of a query
rather than `searcher.count(query)`.
2018-07-17 14:32:02 +02:00
Michael Braun f0e1864ceb Merge remote-tracking branch 'source/master' into remove-constructor-wrapper-classes 2018-07-14 13:39:37 -04:00
Nicholas Knize b5ef13330f LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes into a triangular mesh and indexes individual triangles as a 6 dimension point 2018-07-14 11:28:37 -05:00
Adrien Grand b1bb11b79d LUCENE-8391: More tests for merge policies. 2018-07-10 09:17:34 +02:00
Adrien Grand 41ddac5b44 LUCENE-8385: Fix computation of the allowed segment count in TieredMergePolicy. 2018-07-09 15:21:10 +02:00
Erick Erickson c303c5f126 LUCENE-8370: Reproducing TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields() failures 2018-06-28 18:28:37 -07:00
Alan Woodward ab2fec1642 LUCENE-8237: Correct handling of position increments in sub-tokenstreams 2018-06-18 09:57:38 +01:00
Nhat Nguyen 8a6f1bf5ad LUCENE-8165: Ban copyOf and copyOfRange.
These methods are lenient with out-of-bounds indices.

Signed-off-by: Adrien Grand <jpountz@gmail.com>
2018-06-07 10:08:21 +02:00
Michael Braun 78079fc552 Merge remote-tracking branch 'source/master' into remove-constructor-wrapper-classes 2018-06-05 18:48:55 -04:00
Simon Willnauer 59087d148a [TEST] Ensure MDW.assertNoUnreferencedFilesOnClose is threadsafe 2018-06-04 17:33:18 +02:00
Simon Willnauer fe83838ec3 LUCENE-8341: Record soft deletes in SegmentCommitInfo
This change add the number of documents that are soft deletes but
not hard deleted to the segment commit info. This is the last step
towards making soft deletes as powerful as hard deltes since now the
number of document can be read from commit points without opening a
full blown reader. This also allows merge posliies to make decisions
without requiring an NRT reader to get the relevant statistics. This
change doesn't enforce any field to be used as soft deletes and the statistic
is maintained per segment.
2018-06-04 15:05:12 +02:00
Simon Willnauer e7a0a12926 LUCENE-8335: Enforce soft-deletes field up-front
Soft deletes field must be marked as such once it's introduced
and can't be changed after the fact.

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-06-04 08:28:38 +02:00
Michael Braun fb6574100e LUCENE-8345 - add wrapper class constructors to forbiddenapis 2018-06-03 15:40:50 -04:00
Simon Willnauer 3dc4fa199c Revert "LUCENE-8335: Enforce soft-deletes field up-front."
This reverts commit a2d9276674.
2018-06-02 13:47:24 +02:00
Simon Willnauer a2d9276674 LUCENE-8335: Enforce soft-deletes field up-front.
Soft deletes field must be marked as such once it's introduced
and can't be changed after the fact.
2018-06-02 13:14:53 +02:00
Simon Willnauer 34741a863a LUCENE-8330: Exclude MockRandomMP from basic tests 2018-05-29 16:58:03 +02:00
Simon Willnauer c93f628317 LUCENE-8330: Detach IndexWriter from MergePolicy
This change introduces a new MergePolicy.MergeContext interface
that is easy to mock and cuts over all instances of IW to MergeContext.
Since IW now implements MergeContext the cut over is straight forward.
This reduces the exposed API available in MP dramatically and allows
efficient testing without relying on IW to improve the coverage and
testability of our MP implementations.
2018-05-25 07:37:09 +02:00
Simon Willnauer 70cfe46689 LUCENE-8320: Fix NPE in WindowsFS if target file exists but isn't open 2018-05-18 19:38:11 +02:00
Alan Woodward b1ee23c525 LUCENE-8273: Fix end() and posInc handling 2018-05-18 13:11:39 +01:00
Simon Willnauer 42a79970d5 LUCENE-8320: Fix WindowsFS#rename with hardlinks 2018-05-18 09:33:50 +02:00
Simon Willnauer 3fe612bed2 LUCENE-8318: Ensure pending delete is not brought back on a try delete attempt
When renaming a file, `FSDirectory#rename` tries to delete the dest file
if it's in the pending deletes list. If that delete fails, it adds the
dest to the pending deletes list again. This causes the dest file to be
deleted later by `deletePendingFiles`.
2018-05-17 11:02:35 +02:00
Adrien Grand 6d69824a6b LUCENE-8314: More checks on AssertingScorer. 2018-05-16 17:54:19 +02:00
Adrien Grand 9b9776a714 LUCENE-8313: Simplify SimScorer. 2018-05-16 17:53:56 +02:00
Simon Willnauer 585952797c LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes
Today we fail creating the IndexWriter when the directory has a
pending delete. Yet, this is mainly done to prevent writing still
existing files more than once. IndexFileDeleter already accounts for
that for existing files which we can now use to also take pending
deletes into account which ensures that all file generations per segment
always go forward.
2018-05-16 11:17:43 +02:00
Adrien Grand d764156f91 LUCENE-8303: Make the overflow test a Monster rather than Nightly. 2018-05-11 14:36:42 +02:00
Simon Willnauer a3c86373e4 LUCENE-8298: Allow DocValues updates to reset a value
Today once a document has a value in a certain DV field this values
can only be changed but not removed. While resetting / removing a value
from a field is certainly a corner case it can be used to undelete a
soft-deleted document unless it's merged away.
This allows to rollback changes without rolling back to another commitpoint
or trashing all uncommitted changes. In certain cenarios it can be used to
"repair" history of documents in distributed systems.
2018-05-09 18:57:57 +02:00
Adrien Grand 8dc69428e3 LUCENE-8303: Make LiveDocsFormat only responsible for serialization/deserialization of live docs. 2018-05-09 15:40:14 +02:00
Dawid Weiss 85c00e77ef LUCENE-8267: removed references to memory codecs. 2018-05-08 10:32:11 +02:00
Adrien Grand 67c13bbe2e LUCENE-8142: Fix QueryUtils to only call getMaxScore when it is legal to do so. 2018-05-02 17:42:18 +02:00
Adrien Grand 46ecb73976 LUCENE-8142: Fix AssertingImpactsEnum and add missing javadoc. 2018-05-02 17:20:42 +02:00
Adrien Grand af680af77f LUCENE-8142: Make postings APIs expose raw impacts rather than scores. 2018-05-02 14:49:32 +02:00
Simon Willnauer 933d8a6995 LUCENE-8275: Fix BaseLockFactoryTestCase to step out on Windowns if pending files are found
The particular test here is #testStressLocks that has several protectesion against
WindowsFS and special logic in the catch clause that steps out on fatal exceptions with
pending deletes. Since we now check this consistently in the IW ctor we need to also
skip this entire test if we are on windows and have pending deletes.
2018-04-26 12:10:10 +02:00
Alan Woodward e167e91247 LUCENE-8270: Remove MatchesIterator.term() 2018-04-23 16:51:17 +01:00
Simon Willnauer 6f0a884582
LUCENE-8269: Detach downstream classes from IndexWriter
IndexWriter today is shared with many classes like BufferedUpdateStream,
DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
on the writer instance or assert that the current thread doesn't hold a lock.
This makes it very difficult to have a manageable threading model.

This change separates out the IndexWriter from those classes and makes them all
independent of IW. IW now implements a new interface for DocumentsWriter to communicate
on failed or successful flushes and tragic events. This allows IW to make it's critical
methods private and execute all lock critical actions on it's private queue that ensures
that the IW lock is not held. Follow-up changes will try to detach more code like
publishing flushed segments to ensure we never call back into IW in an uncontrolled way.
2018-04-23 17:17:40 +02:00
Simon Willnauer c70cceaee5 LUCENE-8253: Account for soft-deletes before they are flushed to disk
Inside the IndexWriter buffers are only written to disk if it's needed
or "worth it" which doesn't guarantee soft deletes to be accounted
in time. This is not necessarily a problem since they are eventually
collected and segments that have soft-deletes will me merged eventually
but for tests and on par behavior compared to hard deletes this behavior
is tricky.
This change cuts over to accounting in-place just like hard-deletes. This
results in accurate delete numbers for soft deletes at any give point in time
once the reader is loaded or a pending soft delete occurs.

This change also fixes an issue where all updates to a DV field are allowed
event if the field is unknown. Now this only works if the field is equal
to the soft deletes field. This behavior was never released.
2018-04-16 16:17:06 +02:00
Mike McCandless 7c0387ad3f LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now also overrides getMaxCFSSegmentSizeMB 2018-04-13 15:45:19 -04:00
Alan Woodward 040a9601b1 LUCENE-8229: Add Weight.matches() to iterate over match positions 2018-04-11 09:43:27 +01:00
Alan Woodward 798d351034 LUCENE-8242: Deprecate createNormalizedWeight 2018-04-09 15:07:04 +01:00
Simon Willnauer ed62b990d8
LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper
This adds support for soft deletes if the reader is opened form a directory.
Today we only support soft deletes for NRT readers, this change allows to wrap
existing DirectoryReader with a SoftDeletesDirectoryReaderWrapper to also filter
out soft deletes in the case of a non-NRT reader.
2018-04-09 11:50:38 +02:00
Simon Willnauer ecc17f9023 LUCENE-8233: Add support for soft deletes to IndexWriter
This change adds support for soft deletes as a fully supported feature
by the index writer. Soft deletes are accounted for inside the index
writer and therefor also by merge policies.

This change also adds a SoftDeletesRetentionMergePolicy that allows
users to selectively carry over soft_deleted document across merges
for renention policies. The merge policy selects documents that should
be kept around in the merged segment based on a user provided query.
2018-04-04 13:45:14 +02:00
Robert Muir e595541ef3 LUCENE-8192: always enforce index-time offsets are correct with BaseTokenStreamTestCase 2018-03-26 22:02:34 -04:00
Alan Woodward fac84c01c8 LUCENE-8202: Add FixedShingleFilter 2018-03-21 13:45:03 +00:00
Simon Willnauer 2e35ef2b3d LUCENE-8215: Fix several fragile exception handling places in o.a.l.index
Several places in the index package don't handle exceptions well or ignores them.
This change adds some utility methods and cuts over to make use of try/with blocks
to simplify exception handling.
2018-03-20 10:50:12 +01:00
Adrien Grand 3048e5da22 LUCENE-8008: Remove unintended changes. 2018-03-20 09:52:24 +01:00
Robert Muir 97299ed006 LUCENE-8191: if a tokenstream has broken offsets, its broken. IndexWriter always checks, so a separate whitelist can't work 2018-03-04 11:23:45 -05:00
Erick ad7e94afb2 SOLR-12028: BadApple and AwaitsFix annotations usage 2018-03-03 21:42:14 -08:00
Uwe Schindler 7dba350c7a SOLR-12028: Make initialization of constants dynamic (by reading the annotation), also add missing reproduce info 2018-02-28 00:47:00 +01:00
Erick Erickson 1fe45606b9 SOLR-12028: BadApple and AwaitsFix annotations usage 2018-02-26 20:35:12 -08:00
Adrien Grand 317a2e0c3d LUCENE-8153: Make impacts checks lighter by default.
The new `-slow` switch makes checks more complete but also more heavy. This
option also cross-checks term vectors.
2018-02-20 17:14:11 +01:00
Adrien Grand 4fb7e3d02c LUCENE-8135: Implement block-max WAND. 2018-02-15 15:13:58 +01:00
Alan Woodward 342e38217a LUCENE-8163: BaseDirectoryTestCase produces random filenames that fail on Windows 2018-02-09 09:14:02 +00:00
Adrien Grand f410df8113 LUCENE-4198: Give codecs the opportunity to index impacts. 2018-01-31 14:54:52 +01:00
Adrien Grand 75d50b4492 LUCENE-8116: Remove unnecessary IOException. 2018-01-11 11:49:36 +01:00
Adrien Grand 838c604b76 LUCENE-8119: Remove SimScorer.maxScore(float maxFreq). 2018-01-09 14:42:16 +01:00
Alan Woodward d250a1463d LUCENE-8133: Rename TermContext to TermStates, and load TermState lazily if term stats are not required 2018-01-05 14:17:15 +00:00
Adrien Grand 8fd7ead940 LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document scoring factors. 2018-01-04 15:13:36 +01:00
Alan Woodward c1030eeb74 LUCENE-8012: Explanation takes Number rather than float 2018-01-02 11:06:59 +00:00
Adrien Grand b2f248164c LUCENE-8010: Fix similarities so that they pass tests. 2017-12-29 10:06:00 +01:00
Steve Rowe 3e2f9e62d7 LUCENE-2899: Add OpenNLP Analysis capabilities as a module 2017-12-15 11:24:18 -05:00
Adrien Grand d5c72eb588 LUCENE-8081: Remove unused import. 2017-12-08 08:45:18 +01:00
Simon Willnauer ede46fe6e9 LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads
Index/Update Threads try to help out flushing pending document buffers to
disk. This change adds an expert setting to opt ouf of this behavior unless
flusing is falling behind.
2017-12-07 16:22:52 +01:00
Adrien Grand 4fc5a872de LUCENE-4100: Faster disjunctions when the hit count is not needed. 2017-12-07 10:49:39 +01:00
Adrien Grand 63b63c5734 LUCENE-8015: Fixed DFR similarities' scores to not decrease when tfn increases. 2017-12-06 18:19:57 +01:00
Adrien Grand a8a63464e7 LUCENE-7996: Queries are now required to produce positive scores. 2017-12-06 14:06:03 +01:00
Simon Willnauer 01d12777c4 LUCENE-8068: Allow IndexWriter to write a single DWPT to disk
Adds a `flushNextBuffer` method to IndexWriter that allows the caller to
synchronously move the next pending or the biggest non-pending index buffer to
disk. This enables flushing selected buffer to disk without highjacking an
indexing thread. This is for instance useful if more than one IW (shards) must
be maintained in a single JVM / system.
2017-11-30 18:57:27 +01:00
Adrien Grand d27ddcb409 LUCENE-8008: Reduce leniency in CheckHits. 2017-11-29 18:09:38 +01:00
David Smiley 64d95e6a6d LUCENE-8049: IndexWriter.getMergingSegments() signature changed to return Set instead of Collection 2017-11-26 23:25:06 -05:00
Alan Woodward 183571c085 LUCENE-6278: Remove Scorer.freq() 2017-11-15 11:14:16 +00:00
Alan Woodward 276e317e94 LUCENE-8042: Add SegmentCachable interface 2017-11-10 12:17:50 +00:00
Alan Woodward 1aa049bb27 LUCENE-8014: Remove deprecated SimScorer methods 2017-11-10 09:43:18 +00:00
Alan Woodward 764abcb31a Revert "LUCENE-8014: Remove deprecated SimScorer methods"
Reverting to fix test failures

This reverts commit 946ec9d5b9.
2017-11-10 09:02:03 +00:00
Alan Woodward 946ec9d5b9 LUCENE-8014: Remove deprecated SimScorer methods 2017-11-09 14:05:34 +00:00
Alan Woodward a886a001a4 LUCENE-8017: Add Weight.getCacheHelper() 2017-11-03 10:40:14 +00:00
Robert Muir ca5f9b3457 LUCENE-8007: Make scoring statistics mandatory 2017-11-02 23:02:21 -04:00
Robert Muir 875d45ff14 LUCENE-8030: fix buggy assert 2017-10-31 22:30:33 -04:00
Robert Muir e0bde57981 LUCENE-8020: don't force sim to score bogus terms (e.g. docfreq=0) 2017-10-30 20:32:12 -04:00
Robert Muir 489ca238c4 LUCENE-8021: Add AssertingSimilarity 2017-10-30 18:38:26 -04:00
Robert Muir 42717d5f4b LUCENE-7997: More sanity testing of similarities 2017-10-24 22:48:04 -04:00
Mike McCandless ea36f5040c LUCENE-7999: upgrade int to long for tracking the counter for the next segment name to prevent overflow 2017-10-24 13:13:41 -04:00
Dawid Weiss 46cd679e91 LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface instead of an abstract class with a single method. 2017-10-04 10:59:16 +02:00
Nicholas Knize bf71650ad7 LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type. 2017-09-19 14:45:04 -05:00
yonik a4374e840d SOLR-11173: implement Points support in TermsComponent via PointMerger 2017-08-19 18:02:11 -04:00
Adrien Grand 9c83d025e4 LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more than 8x greater than the cost of the lead iterator in order to use doc values. 2017-08-10 12:10:44 +02:00