The class `BooleanWeight` takes a `BooleanQuery` (a list of `BooleanClause`s) as input and maintains a list of weights corresponding to the clauses. The clauses and the weights are iterated in parallel in various places throughout the class. At these code locations, it is not obvious that these two lists always have the same length, i.e., that the parallel iteration is safe. Moreover, the parallel iteration is not well supported by the Java language, which is why this operation is implemented differently throughout the code.
This patch joins the two lists to enable parallel iteration without managing two separate lists. This makes the code’s intent more obvious and prevents bugs due to the lists getting out of sync by a future change.
The UH now detects that parts of the query are not understood by it.
When found, it highlights more safely/reliably.
Fixes compatibility with complex and surround query parsers.
Today if an executor was added to the IndexSearcher it's impossible to
clone the searcher with it's cache, similarty and caching policy since
the executor is not exposed. This adds a simple getter to make cloning
easier.
In order to simplify testing this change moves to use the Executor
interface instead of ExecutorService. This change also simplifies
customizing execute methods for use-cases that need to add additional
logic for forking to new threads. This change also adds a test for
the optimization added in LUCENE-8865.
This change is fully backwards compatible since ExecutorService implements
Executor.
Today we don't utilize the incoming thread for a search when IndexSearcher
has an executor. This thread is only idling but can be used to execute a search
once all other collectors are dispatched.
FileSwitchDirectory fails if the tmp file are not in the same directory
as the file it's renamed to. This is correct behavior but breaks with
tmp files used with index sorting. This change tries best effort to find
the right extension directory if the file ends with `.tmp`
FileSwitchDirectory splits file actions between 2 directories based
on file extensions. The extensions are respected on write operations
like delete or create but ignored when we list the content of the
directories. Until now we only deduplicated the contents on
Directory#listAll which can cause inconsistencies and hard to debug
errors due to double deletions in IndexWriter is a file is pending
delete in one of the directories but still shows up in the directory
listing form the other directory. This case can happen if both
directories point to the same underlying FS directory which is a
common use-case to split between mmap and NIOFS.
This change filters out files from directories depending on their
file extension to make sure files that are deleted in one directory
are not returned form another if they point to the same FS directory.
Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a
directory. However, the catch block here is too broad, for example it
would be ignoring IOExceptions when we try to open a non-existent
file. This commit addresses that by scoping the ignored exceptions only
to the invocation of FileChannel#force. This prevents us from
suppressing an exception in case we run into an unexpected issue when
opening the file.
However, fsyncing directories on Windows is not possible. We always
suppressed this by allowing that an AccessDeniedException is thrown when
attemping to open the directory for reading. Yet, per the above, this
suppression also allowed other IOExceptions to be suppressed, and that
should be considered a bug (e.g., not only the directory not existing,
but any filesystem error and other reasons that we might get an access
denied there, like genuine permissions issues). Rather than relying on
exceptions for flow control and continuing to suppress there, we simply
return early if attempting to fsync a directory on Windows (we should
not put this burden on the caller).
This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation.
This defeats the purpose of using StringBuilder and also introduces an extra alloction.
These changes should avoid that.
ant tests have run, succeeded on local machine.
Removing test files from the changes.
Another suggested rework.
This commit introduces a new DocValues field and corresponding
range query for binary ranges. These classes are extended into
concrete implementations for each of Int, Long, Float and Double
range fields.
This change adds a static method FeatureField#newDoubleValues() which can be used to retrieved the values of a feature for documents directly rathert than having to store the values in a numeric field alongsidde the feature field.
Today we don't have a strong protection that we add and apply deletes / updates
on or from an already flushed delete queue. DWPTDeleteQueue instances are replaced
once we do a full flush in order to reopen an NRT reader or commit the IndexWriter.
In LUCENE-8813 we tripped an assert that used to protect us from such an situation
but it didn't take all cornercases from concurrent flushing into account. This change
adds a stronger protection and ensures that we neither apply a closed delete queue nor
add any updates or deletes to it.
This change also allows to speculativly freeze the global buffer that might return
null now if the queue has already been closed. This is now possible since we ensure that
we never see modifications to the queue after it's been closed and that happens right after
the last DWPT for the ongoing full flush is done flushing.
The part of speech tag for unigram has been changed inadvertenly in a previous commit (not released).
This change restores the original value that is also set on the serialized unkwnown dictionary.
This test hangs until it times-out when an assertion is tripped
in the indexing thread. Counting down the latch in a finally block
will cause the test to fail earlier.
The current slicing algorithm assigns a thread per segment, which
can be detrimental to performance in case the distribution has
a large number of small segments. The patch introduces a slicing
algorithm which coalesces smaller segments to a single thread,
thus reducing the impact of context switching by limiting the
number of threads
Signed-off-by: Adrien Grand <jpountz@gmail.com>
For boolean queries, we should eliminate redundant SHOULD clauses during
query rewrite and not build the scorer supplier, as opposed to
eliminating them during weight construction
Signed-off-by: jimczi <jimczi@apache.org>
Ensure new threadstates are locked before retrieving the
number of active threadstates. This causes assertion errors
and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing.