Robert Muir 22bbc603b4
Add deprecated complement (~) operator to RegExp (#13739)
Previously all regexp parsing required determinization and minimization up-front, which can be costly (exponential time).

Lucene 10 removes the determinization and minimization from RegExp and allows the user to choose:

* determinize() the result and get the DFA query execution of previous releases.
* don't determinize() and possibly get a new NFA query that determinizes-as-it-goes.

Complement of arbitrary automata is incompatible with this choice, as it requires determinization for correctness. It was previously a non-default operator that could be enabled with a special flag: RegExp.COMPLEMENT, or would be included with RegExp.ALL, which turns on all special syntax flags. Lucene 10 removed the operator, as it can't be supported while still giving the user the NFA/DFA choice, and requires exponential time during parsing.

To ease transition: add RegExp.DEPRECATED_COMPLEMENT syntax flag and Kind.DEPRECATED_COMPLEMENT node:
* syntax flag can be enabled with RegExp(s, RegExp.DEPRECATED_COMPLEMENT);
* syntax flag is **NOT** included by RegExp.ALL: e.g. you must do RegExp(s, RegExp.ALL | RegExp.DEPRECATED_COMPLEMENT) to get ALL flags and also the deprecated complement (~) operator. This enforces a java deprecation reference in the calling code to enable the flag.
* deprecated complement (~) runs with an internal limit: Operations.DEFAULT_DETERMINIZE_WORK_LIMIT. It is not configurable. If it is exceeded, you get TooComplexToDeterminize() exception.
* there is intentionally only a single dead-simple test so that this hack doesn't cause us pain with CI/builds. We don't want random automata testing to only rarely encounter an exponential algorithm!

After lucene 10 is branched, this deprecated support can be removed by reverting this commit.
2024-09-09 08:05:00 -04:00
2022-08-16 20:02:47 +09:00
2010-12-12 15:36:08 +00:00
2023-04-18 15:58:09 -04:00

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status Revved up by Develocity

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building

Basic steps:

  1. Install OpenJDK 21.
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

  • Additional Developer Documentation: dev-docs/

Discussion and Support

Description
Apache Lucene open-source search software
Readme 865 MiB
Languages
Java 97.7%
HTML 1%
Python 0.9%
Lex 0.3%