lucene

Commit Graph

Author	SHA1	Message	Date
Ignacio Vera	0cef29f138	LUCENE-9417: Tessellator might fail when several holes share are connected to the same vertex (#1614 )	2020-06-29 17:46:21 +02:00
Simon Willnauer	7f352a9665	LUCENE-8962: Merge small segments on commit (#1617 ) Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching. Co-authored-by: Michael Froh <msfroh@apache.org> Co-authored-by: Michael Sokolov <sokolov@falutin.net> Co-authored-by: Mike McCandless <mikemccand@apache.org>	2020-06-27 22:25:45 +02:00
Mayya Sharipova	b0333ab5c8	LUCENE-9280: Collectors to skip noncompetitive documents (#1351 ) Similar how scorers can update their iterators to skip non-competitive documents, collectors and comparators should also provide and update iterators that allow them to skip non-competive documents.	2020-06-23 16:04:58 -04:00
Tomas Fernandez Lobbe	4774c6f0c1	Include delegate in AssertingSimilarity toString (#1596 )	2020-06-22 16:38:00 -07:00
Michael Sokolov	5d43e73c66	Revert "LUCENE-8962: add ability to selectively merge on commit (#1552 )" This reverts commit `972c84022e`.	2020-06-22 17:35:49 -04:00
Michael Sokolov	972c84022e	LUCENE-8962: add ability to selectively merge on commit (#1552 ) Co-authored-by: Michael Froh <msfroh@apache.org> Co-authored-by: Simon Willnauer <simonw@apache.org>	2020-06-18 16:56:29 -04:00
Adrien Grand	87a3bef50f	LUCENE-9353: Move terms metadata to its own file. (#1473 )	2020-06-16 15:05:28 +02:00
Michael Sokolov	26075fc1dc	LUCENE-9394: fix and suppress warnings (#1563 ) * LUCENE-9394: fix and suppress warnings in lucene/* * Change type of ValuesSource context from raw Map to Map<Object, Object>	2020-06-12 07:25:31 -04:00
Bruno Roustant	75d25ad677	LUCENE-9397: UniformSplit supports encodable fields metadata.	2020-06-11 18:19:48 +02:00
Adrien Grand	54c5dd7d6d	LUCENE-9148: Move the BKD index to its own file. (#1475 )	2020-06-09 09:59:14 +02:00
Alan Woodward	de2bad9039	LUCENE-9330: Make SortFields responsible for index sorting and serialization (#1440 ) This commit adds a new class IndexSorter which handles how a sort should be applied to documents in an index: * how to serialize/deserialize sort info in the segment header * how to sort documents within a segment * how to sort documents from merging segments SortField has a getIndexSorter() method, which will return null if the sort cannot be used to sort an index (eg if it uses scores or other query-dependent values). This also requires a new Codec as there is a change to the SegmentInfoFormat	2020-05-22 13:33:06 +01:00
Erick Erickson	21b08d5cab	LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search	2020-05-21 20:29:18 -04:00
Uwe Schindler	06df50e759	LUCENE-9321: Port markdown task to Gradle (#1477 )	2020-05-17 14:46:26 +02:00
Mike McCandless	1783c4ad47	LUCENE-9191: ensure LineFileDocs random seeking effort does not seek into the middle of a multi-byte UTF-8 encoded Unicode character	2020-05-04 13:29:00 -04:00
Simon Willnauer	207d240ae2	Fix tests to survive nightly runs with many documents	2020-04-29 22:11:42 +02:00
Simon Willnauer	bc4da80776	Fix visibility on member variables in IndexWriter and friends (#1460 ) Today it looks like wild wild west inside IndexWriter and some of it's associated classes. This change makes sure all non-final members have private visibility, methods that are not used outside of IW today are made private unless they have been public. This change also removes some unused or unnecessary members where possible and deleted some dead code from previous refactoring.	2020-04-27 17:49:20 +02:00
Simon Willnauer	d7e0b906ab	LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451 ) This change extracts the methods that are used by MergeScheduler into a MergeSource interface. This allows IndexWriter to better ensure locking, hide internal methods and removes the tight coupling between the two complex classes. This will also improve future testing.	2020-04-24 15:02:55 +02:00
Simon Willnauer	83018deef7	Ensure we use a sane IWC for tests adding many documents. This test produced tons of files on nighly builds causing TooManyOpenFilesExceptions likely due to not using CFS on flush and/or very small maxMergeSize values.	2020-04-24 08:36:06 +02:00
Simon Willnauer	4a98918bfa	LUCENE-9339: Only call MergeScheduler when we actually found new merges (#1445 ) IW#maybeMerge calls the MergeScheduler even if it didn't find any merges we should instead only do this if there is in-fact anything there to merge and safe the call into a sync'd method.	2020-04-22 21:26:45 +02:00
Mike McCandless	e0c06ee6a6	LUCENE-9191: make LineFileDocs random seeking more efficient by recording safe skip points in the concatenated gzip'd chunks	2020-04-21 12:09:17 -04:00
Simon Willnauer	113043b1ed	LUCENE-9324: Add an ID to SegmentCommitInfo (#1434 ) We already have IDs in SegmentInfo, as well as on SegmentInfos which are useful to uniquely identify segments and entire commits. Having IDs on SegmentCommitInfo is be useful too in order to compare commits for equality and make snapshots incremental on generational files. This change adds a unique ID to SegmentCommitInfo starting from Lucene 8.6. Older segments won't have an ID until the segment receives an update or a delete even if they have been opened and / or committed by Lucene 8.6 or above.	2020-04-18 14:24:57 +02:00
Adrien Grand	0aa4ba7ccb	LUCENE-9260: Verify checksums of CFS files. (#1311 )	2020-04-15 15:10:59 +02:00
Simon Willnauer	2602269f3e	LUCENE-9304: Refactor DWPTPool to pool DWPT directly (#1397 ) This change removes the ThreadState indirection from DWPTPool and pools DWPT directly. The tracking information and locking semantics are mostly moved to DWPT directly and the pool semantics have changed slightly such that DWPT need to be checked-out in the pool once they need to be flushed or aborted. This automatically grows and shrinks the number of DWPT in the system when number of threads grow or shrink. Access of pooled DWPTs is more straight forward and doesn't require ordinal. Instead consumers can just iterate over the elements in the pool. This allowed for removal of indirections in DWPTFlushControl like BlockedFlush, the removal of DWPTPool setter and getter in IndexWriterConfig and the addition of stronger assertions in DWPT and DW.	2020-04-11 12:23:46 +02:00
Bruno Roustant	c7cf9e8e4f	LUCENE-9254: UniformSplit supports FST off-heap. Closes #1301	2020-03-09 16:35:42 +01:00
Michael Sokolov	4501b3d3fd	Revert "LUCENE-8962: Split test case (#1313 )" This reverts commit `90aced5a51`. Revert "LUCENE-8962: woops, remove leftover accidental copyright (darned IDEs)" This reverts commit `3dbfd10279`. Revert "LUCENE-8962: Fix intermittent test failures" This reverts commit `a5475de57f`. Revert "LUCENE-8962: Add ability to selectively merge on commit (#1155)" This reverts commit `a1791e7714`.	2020-03-08 18:27:54 -04:00
Bruno Roustant	c73d2c15ba	LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter.	2020-03-06 14:24:12 +01:00
Robert Muir	624f5a3c2f	LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory Closes #1321	2020-03-06 05:42:22 -05:00
Bruno Roustant	9733643466	LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes. Closes #1320	2020-03-06 11:15:09 +01:00
Yannick Welsch	8a88dd02c6	Remove SimpleFSDirectory in favor of NIOFSDirectory	2020-03-06 00:04:25 +01:00
Ignacio Vera	c313365c5f	LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon (#1290 ) Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly	2020-03-03 07:07:34 +01:00
msfroh	043c5dff6f	LUCENE-8962: Add ability to selectively merge on commit (#1155 ) * LUCENE-8962: Add ability to selectively merge on commit This adds a new "findCommitMerges" method to MergePolicy, which can specify merges to be executed before the IndexWriter.prepareCommitInternal method returns. If we have many index writer threads, they will flush their DWPT buffers on commit, resulting in many small segments, which can be merged before the commit returns. * Add missing Javadoc * Fix incorrect comment * Refactoring and fix intermittent test failure 1. Made some changes to the callback to update toCommit, leveraging SegmentInfos.applyMergeChanges. 2. I realized that we'll never end up with 0 registered merges, because we throw an exception if we fail to register a merge. 3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before we call MergeScheduler.merge, since we may not be merging on another thread. 4. There was an intermittent test failure due to randomness in the time it takes for merges to complete. Before doing the final commit, we wait for pending merges to finish. We may still end up abandoning the final merge, but we can detect that and assert that either the merge was abandoned (and we have > 1 segment) or we did merge down to 1 segment. * Fix typo * Fix/improve comments based on PR feedback * More comment improvements from PR feedback * Rename method and add new MergeTrigger 1. Renamed findCommitMerges -> findFullFlushMerges. 2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to MergeScheduler when merging on commit. * Update renamed method name in strings and comments	2020-03-02 12:19:47 -05:00
Adrien Grand	f4b5069c1f	LUCENE-9247: Fix test failures with ExtraFS.	2020-03-02 08:59:13 +01:00
Adrien Grand	c929b65c81	LUCENE-9247: Exclude `write.lock` from files whose integrity is expected to be verified.	2020-02-29 08:46:16 +01:00
Adrien Grand	30944d3520	LUCENE-9247: Fix class visibility issue discovered when backporting.	2020-02-28 15:34:36 +01:00
Adrien Grand	cd984e2dc0	LUCENE-9247: Add tests for `checkIntegrity`. (#1284 ) This adds a test to `BaseIndexFileFormatTestCase` that the combination of opening a reader and calling `checkIntegrity` on it reads all bytes of all files (including index headers and footers). This would help detect most cases when `checkIntegrity` is not implemented correctly.	2020-02-28 14:19:56 +01:00
Ignacio Vera	88dd1c3f3d	LUCENE-9238: Add new XYPointField, queries and sorting capabilities (#1272 ) New XYPointField field and Queries for indexing, searching and sorting cartesian points.	2020-02-21 11:26:30 +01:00
Ignacio Vera	d48bafb299	LUCENE-8707: Add LatLonShape and XYShape distance query (#587 )	2020-02-19 16:03:30 +01:00
markharwood	79a4a680e7	Test fix - new binary doc values test could use invalid values.	2020-02-19 09:14:14 +00:00
markharwood	ce2959fe4c	LUCENE-9211 Add compression for Binary doc value fields (#1234 ) Stores groups of 32 binary doc values in LZ4-compressed blocks.	2020-02-18 14:02:42 +00:00
Robert Muir	f41eabdc5f	LUCENE-8279: fix javadocs wrong header levels and accessibility issues Java 13 adds a new doclint check under "accessibility" that the html header nesting level isn't crazy. Many are incorrect because the html4-style javadocs had horrible font-sizes, so developers used the wrong header level to work around it. This is no issue in trunk (always html5). Java recommends against using such structured tags at all in javadocs, but that is a more involved change: this just "shifts" header levels in documents to be correct.	2020-02-08 10:00:00 -05:00
Nicholas Knize	206a70e7b7	LUCENE-9149: Increase data dimension limit in BKD	2020-02-07 16:08:14 -06:00
Robert Muir	0d339043e3	LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5 features are used, but unfortunately also some constructs that do not exist in HTML5 are used as well. Because of this, we have no checking of any html syntax. jtidy is disabled because it works with html4. doclint is disabled because it works with html5. our docs are neither. javadoc "doclint" feature can efficiently check that the html isn't crazy. we just have to fix really ancient removed/deprecated stuff (such as use of tt tag). This enables the html checking in both ant and gradle. The docs are fixed via straightforward transformations. One exception is table cellpadding, for this some helper CSS classes were added to make the transition easier (since it must apply padding to inner th/td, not possible inline). I added TODOs, we should clean this up. Most problems look like they may have been generated from a GUI or similar and not a human.	2020-02-06 22:30:52 -05:00
Adrien Grand	136dcbdbbc	LUCENE-9147: Move the stored fields index off-heap. (#1179 ) This replaces the index of stored fields and term vectors with two `DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number of values to write up-front, so incoming doc IDs and file pointers are buffered on disk using temporary files that never get fsynced, but have index headers and footers to make sure any corruption in these files wouldn't propagate to the index. `DirectMonotonicReader` gets a specialized `binarySearch` implementation that leverages the metadata in order to avoid going to the IndexInput as often as possible. Actually in the common case, it would only go to a single sub `DirectReader` which, combined with the size of blocks of 1k values, helps bound the number of page faults to 2.	2020-02-05 18:35:08 +01:00
Mike McCandless	47386f8cca	LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests	2020-02-05 09:51:31 -05:00
Robert Muir	9ceaff913e	LUCENE-9195: more slow tests fixes	2020-01-31 07:57:34 -05:00
Robert Muir	29469b454f	LUCENE-9192: speed up more slow tests	2020-01-29 14:31:32 -05:00
Robert Muir	3bcc97c8eb	LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase	2020-01-28 11:55:51 -05:00
Robert Muir	975df9ddd3	LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task	2020-01-27 12:05:34 -05:00
Robert Muir	fddb5314fc	LUCENE-9172: nuke some compiler warnings	2020-01-27 06:08:30 -05:00
Robert Muir	c53cc3edaf	LUCENE-9167: test speedup for slowest/pathological tests (round 3)	2020-01-24 08:58:59 -05:00

1 2 3 4 5 ...

1617 Commits