Adrien Grand bc341f2b3e
Speed up advancing on the disjunction iterator. (#14052)
Currently, the disjunction iterator puts all clauses in a heap in order to be
able to merge doc IDs in a streaming fashion. This is a good approach for
exhaustive evaluation, when only one clause moves to a different doc ID on
average and the per-iteration cost is in the order of O(log(N)) where N is the
number of clauses.

However, if a selective filter is applied, this could cause many clauses to
move to a different doc ID. In the worst-case scenario, all clauses could move
to a different doc ID and the cost of maintaiting heap invariants could grow to
O(N * log(N)) (every clause introduces a O(log(N)) cost). With many clauses,
this is much higher than the cost of checking all clauses sequentially: O(N).

To protect from this reordering overhead, DisjunctionDISIApproximation now only
puts the cheapest clauses in a heap in a way that tries to achieve up to 1.5
clauses moving to a different doc ID on average. More expensive clauses are
checked linearly.
2024-12-16 15:33:26 +01:00
2022-08-16 20:02:47 +09:00
2010-12-12 15:36:08 +00:00
2024-09-30 11:31:56 +01:00
2024-10-14 17:59:52 +02:00

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status Revved up by Develocity

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building

Basic steps:

  1. Install OpenJDK 21.
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

  • Additional Developer Documentation: dev-docs/

Discussion and Support

Description
Apache Lucene open-source search software
Readme 855 MiB
Languages
Java 97.7%
HTML 1%
Python 0.9%
Lex 0.3%