11853 Commits

Author SHA1 Message Date
Simon Willnauer
bc4da80776
Fix visibility on member variables in IndexWriter and friends (#1460)
Today it looks like wild wild west inside IndexWriter and some of it's
associated classes. This change makes sure all non-final members have
private visibility, methods that are not used outside of IW today are
made private unless they have been public. This change also removes
some unused or unnecessary members where possible and deleted some dead
code from previous refactoring.
2020-04-27 17:49:20 +02:00
Uwe Schindler
64eed9a1a6
LUCENE-9347: Add support for forbiddenapis 3.0 (#1459)
LUCENE-9347: Add support for forbiddenapis 3.0
2020-04-27 11:54:59 +02:00
Alan Woodward
5d5b7e14d4 LUCENE-9314: Use SingletonDocumentBatch in monitor when we only have a single document 2020-04-27 10:41:49 +01:00
Tomoko Uchida
13bbe60333 LUCENE-9344: update file names (MIGRATE.txt, BUILD.txt => MIGRATE.md, BUILD.md) 2020-04-27 10:23:52 +09:00
Tomoko Uchida
f03e6aac59
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-27 08:43:04 +09:00
Simon Willnauer
8059eea160
Consolidate all IW locking inside IndexWriter (#1454)
Today we still have one class that runs some tricky logic that should
be in the IndexWriter in the first place since it requires locking on
the IndexWriter itself. This change inverts the API and now FrozendBufferedUpdates
does not get the IndexWriter passed in, instead the IndexWriter owns most of the logic
and executes on a FrozenBufferedUpdates object. This prevent locking on IndexWriter out
side of the writer itself and paves the way to simplify some concurrency down the road
2020-04-24 19:07:21 +02:00
Pierre-Luc Perron
013e98347a LUCENE-9267 Replace getQueryBuildTime time unit from ms to ns 2020-04-24 10:36:30 -05:00
Simon Willnauer
d7e0b906ab
LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451)
This change extracts the methods that are used by MergeScheduler into
a MergeSource interface. This allows IndexWriter to better ensure
locking, hide internal methods and removes the tight coupling between the two
complex classes. This will also improve future testing.
2020-04-24 15:02:55 +02:00
Alan Woodward
5eb117f561 LUCENE-9340: Remove deprecated SimpleBindings#add(SortField) method 2020-04-24 12:22:21 +01:00
Alan Woodward
f6462ee350
LUCENE-9340: Deprecate SimpleBindings#add(SortField) (#1447)
This method is trappy; it doesn't work for all SortField types, but doesn't tell
you that until runtime. This commit deprecates it, and removes all other
callsites in the codebase.
2020-04-24 12:08:16 +01:00
Alan Woodward
ed3caab2d8
LUCENE-9338: Clean up type safety in SimpleBindings (#1444)
Replaces SimpleBindings' Map<String, Object> with a map of
Function<Bindings, DoubleValuesSource> to improve type safety, and
reworks cycle detection and validation to avoid catching 
StackOverflowException
2020-04-24 10:23:50 +01:00
Simon Willnauer
83018deef7 Ensure we use a sane IWC for tests adding many documents.
This test produced tons of files on nighly builds causing
TooManyOpenFilesExceptions likely due to not using CFS on flush
and/or very small maxMergeSize values.
2020-04-24 08:36:06 +02:00
Tomoko Uchida
75b648ce82 LUCENE-9344: Use https url for lucene.apache.org 2020-04-24 14:45:34 +09:00
Tomoko Uchida
c7697b088c
LUCENE-9344: Convert .txt files to properly formatted .md files (#1449) 2020-04-24 14:28:12 +09:00
Tomas Fernandez Lobbe
a11b78e06a
LUCENE-9342: Collector's totalHitsThreshold should not be lower than numHits (#1448)
Use the maximum of the two, this is so that relation is EQUAL_TO in the case of the number of hits in a query is less than the collector's numHits
2020-04-23 12:04:02 -07:00
Simon Willnauer
4a98918bfa
LUCENE-9339: Only call MergeScheduler when we actually found new merges (#1445)
IW#maybeMerge calls the MergeScheduler even if it didn't find any merges we should instead only do this if there is in-fact anything there to merge and safe the call into a sync'd method.
2020-04-22 21:26:45 +02:00
Simon Willnauer
2b6ae53cd9
LUCENE-9337: Ensure CMS updates it's thread accounting datastructures consistently (#1443)
CMS today releases it's lock after finishing a merge before it re-acquires it to update
the thread accounting datastructures. This causes threading issues where concurrently
finishing threads fail to pick up pending merges causing potential thread starvation
on forceMerge calls.
2020-04-22 14:30:14 +02:00
Mike McCandless
e0c06ee6a6 LUCENE-9191: make LineFileDocs random seeking more efficient by recording safe skip points in the concatenated gzip'd chunks 2020-04-21 12:09:17 -04:00
Simon Willnauer
56c61e698c Remove dead code 2020-04-21 13:38:19 +02:00
Ignacio Vera
f914e08b36
LUCENE-9273: Speed up geometry queries by specialising Component2D spatial operations (#1341)
Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic relate method for all relations, we use specialise methods for each one. In addition, the type of triangle is computed at deserialisation time, therefore we can be more selective when decoding points of a triangle
2020-04-20 19:24:49 +02:00
Simon Willnauer
9881dc031c Fix compiler warnings in tests 2020-04-18 14:45:03 +02:00
Simon Willnauer
113043b1ed
LUCENE-9324: Add an ID to SegmentCommitInfo (#1434)
We already have IDs in SegmentInfo, as well as on SegmentInfos which are useful to uniquely identify segments and entire commits. Having IDs on SegmentCommitInfo is be useful too in
order to compare commits for equality and make snapshots incremental on generational files.
This change adds a unique ID to SegmentCommitInfo starting from Lucene 8.6. Older segments won't have an ID until the segment receives an update or a delete even if they have been opened and / or committed by Lucene 8.6 or above.
2020-04-18 14:24:57 +02:00
Erick Erickson
3af165b32a LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects 2020-04-17 20:40:32 -04:00
Tommaso Teofili
243cf2c99d LUCENE-9327 - drop useless casts in BaseXYShapeTestCase 2020-04-17 14:57:18 +02:00
iverase
9340e56551 Add back-compat indices for 8.5.1 2020-04-16 09:52:08 +02:00
iverase
b7b85f3e75 Move bugfix entries to version 8.5.1 2020-04-16 09:36:55 +02:00
iverase
8a88ab0e7c Add bugfix version 8.5.1 2020-04-16 09:27:06 +02:00
Adrien Grand
0aa4ba7ccb
LUCENE-9260: Verify checksums of CFS files. (#1311) 2020-04-15 15:10:59 +02:00
Adrien Grand
aa605b3c70
LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput (#1415) 2020-04-15 15:10:11 +02:00
Simon Willnauer
47bc18478a
Move DWPT private deletes out of FrozenBufferedUpdates (#1431)
This change moves the deletes tracked by FrozenBufferedUpdates that
are private to the DWPT and never used in a global context out of
FrozenBufferedUpdates.
2020-04-14 21:37:19 +02:00
Simon Willnauer
18af6325ed
LUCENE-9304: Fix IW#getMaxCompletedSequenceNumber() (#1427)
After recent refactoring on LUCENE-9304 `IW#getMaxCompletedSequenceNumber()` might
return values that belong to non-completed operations if a full flush is running, a new delete
queue is already in place but not all DWPTs that participate in the full flush have finished it's in
flight operation. This caused rare failures in
`TestControlledRealTimeReopenThread#testControlledRealTimeReopenThread` where
documents are not actually visible given the max completed seqNo. This change streamlines
the delete queue advance, adds a dedicated testcase and ensures that a delete queues
sequence Id space is never exhausted.
2020-04-14 19:39:23 +02:00
Julie Tibshirani
3236d38c8b
Avoid using a raw Arc type. (#1429)
This fixes some compiler warnings that popped up recently.
2020-04-14 09:23:12 +02:00
Simon Willnauer
f5457b82a1 Suppress Direct postings for TestIndexWriterThreadsToSegments to prevent OOM on Nightly 2020-04-13 13:44:15 +02:00
Dawid Weiss
616ec987a9
Do a bit count on 8 bytes from a long directly instead of reading 8 bytes from the reader. Byte order doesn't matter here. (#1426) 2020-04-13 13:37:25 +02:00
Shalin Shekhar Mangar
13f19f6555 SOLR-9906: SolrjNamedThreadFactory is deprecated in favor of SolrNamedThreadFactory. DefaultSolrThreadFactory is removed from solr-core in favor of SolrNamedThreadFactory in solrj package and all solr-core classes now use SolrNamedThreadFactory 2020-04-13 08:16:35 +05:30
Simon Willnauer
8c1f9815db LUCENE-9309: ensure stopMerges is set under IW lock 2020-04-11 19:53:21 +02:00
Simon Willnauer
2602269f3e
LUCENE-9304: Refactor DWPTPool to pool DWPT directly (#1397)
This change removes the ThreadState indirection from DWPTPool and pools DWPT directly. The tracking information and locking semantics are mostly moved to DWPT directly and the pool semantics have changed slightly such that DWPT need to be checked-out in the pool once they need to be flushed or aborted. This automatically grows and shrinks the number of DWPT in the system when number of threads grow or shrink. Access of pooled DWPTs is more straight forward and doesn't require ordinal. Instead consumers can just iterate over the elements in the pool.
This allowed for removal of indirections in DWPTFlushControl like BlockedFlush, the removal of DWPTPool setter and getter in IndexWriterConfig and the addition of stronger assertions in DWPT and DW.
2020-04-11 12:23:46 +02:00
Nhat Nguyen
527e651660 LUCENE-9298: Fix TestBufferedUpdates
This test failed on Elastic CI because we did not add any term in the
loop. This commit ensures that we always add at least one docId, term
and query in the test.
2020-04-10 15:28:10 -04:00
Simon Willnauer
e376582e25
LUCENE-9309: Wait for #addIndexes merges when aborting merges (#1418)
The SegmentMerger usage in IW#addIndexes(CodecReader...) might make changes
to the Directory while the IW tries to clean-up files on rollback. This
causes issues like FileNotFoundExceptions when IDF tries to remove temp files.
This changes adds a waiting mechanism to the abortMerges method that, in addition
to the running merges, also waits for merges in addIndices(CodecReader...)
2020-04-10 12:55:02 +02:00
YuBinglei
2935186c5b
LUCENE-9298: Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared (#1389) 2020-04-10 12:30:47 +02:00
Bruno Roustant
6bba35a709
LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster. 2020-04-09 10:36:37 +02:00
Juan Camilo Rodriguez Duran
de6233976a LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values.
Closes #1408
2020-04-07 16:12:05 -04:00
Adrien Grand
529042e786 LUCENE-9271: Complete fix for setBufferSize. 2020-04-07 17:24:41 +02:00
Adrien Grand
3363e1aa48 LUCENE-9271: Fix bad assertion. 2020-04-07 16:21:33 +02:00
Adrien Grand
82692e76e0 LUCENE-9271: Move BufferedIndexInput to the ByteBuffer API.
Closes #1338
2020-04-07 13:30:09 +02:00
Ignacio Vera
f018c4c813
LUCENE-9244: In 2D, a point can be shared by four leaves (#1279)
Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point is shared by multiple leaves
2020-04-07 10:41:15 +02:00
Erick Erickson
e1e2085e94 SOLR-14386: Update Jetty to 9.4.27 and dropwizard-metrics version to 4.1.5 2020-04-04 16:14:57 -04:00
Jim Ferenczi
b5c5ebe37c
LUCENE-9300: Fix field infos update on doc values update (#1394)
Today a doc values update creates a new field infos file that contains the original field infos updated for the new generation as well as the new fields created by the doc values update.

However existing fields are cloned through the global fields (shared in the index writer) instead of the local ones (present in the segment).
In practice this is not an issue since field numbers are shared between segments created by the same index writer.
But this assumption doesn't hold for segments created by different writers and added through IndexWriter#addIndexes(Directory).
In this case, the field number of the same field can differ between segments so any doc values update can corrupt the index
by assigning the wrong field number to an existing field in the next generation.

When this happens, queries and merges can access wrong fields without throwing any error, leading to a silent corruption in the index.

This change ensures that we preserve local field numbers when creating
a new field infos generation.
2020-04-03 13:58:05 +02:00
Atri Sharma
d6cef4f39c Update CHANGES.txt 2020-04-01 20:56:19 +05:30
Atri Sharma
9ed71a6efe
LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches (#1294)
This commit introduces a mechanism to control allocation of threads to slices planned for a query.
The default implementation uses the size of backlog queue of the executor to determine if a slice should be allocated a new thread
2020-04-01 20:42:26 +05:30