Apache Lucene open-source search software
Go to file
Robert Muir 7a872c7a5c
LUCENE-10296: Stop minimizing regepx (#528)
In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the expression. The parser nor the low-level executors don't implicitly call exponential-time algorithms anymore.

But now that we have cleaned this up, we can see it is even worse than just calling determinize(). We still call minimize() which is much crazier and much more.

We stopped doing this for all other AutomatonQuery subclasses a long time ago, as we determined that it didn't help performance. Additionally, minimization vs. determinization is even less important than early days where we found trouble: the representation got a lot better. Today when you finishState we do a lot of practical sorting/coalescing on-the-fly. Also we added this fancy UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state significantly lower than it was before? So why minimize() ?

Let's just replace minimize() calls with determinize() calls? I've already swapped them out for all of src/test, to get jenkins looking for issues ahead of time.

This change moves hopcroft minimization (MinimizeOperations) to src/test for now. I'd like to explore nuking it from there as a next step, any tests that truly need minimization should be fine with brzozowski's
algorithm.
2021-12-08 21:44:26 -05:00
.github LUCENE-10222: Enable github precommit check workflow on branch_9x 2021-11-05 09:04:10 +01:00
.muse SOLR-14883 Add a Muse (Continuous assurance platform) configuration (#1901) 2020-09-23 17:42:19 -07:00
buildSrc upgrade ecj linter from 3.25.0 -> 3.27.0 (#483) 2021-11-28 12:05:19 -05:00
dev-docs SOLR-15160 update cloud.sh (#2393) 2021-02-21 14:36:19 -05:00
dev-tools DOAP changes for release 9.0.0 2021-12-07 14:38:41 +01:00
gradle LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465) 2021-12-03 20:20:57 -05:00
help LUCENE-9660: correct help/tests.txt. 2021-10-26 08:45:58 +02:00
lucene LUCENE-10296: Stop minimizing regepx (#528) 2021-12-08 21:44:26 -05:00
.asf.yaml titles for github 2021-03-10 12:51:06 +01:00
.dir-locals.el LUCENE-9322: Add Lucene90 codec, including VectorFormat 2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs LUCENE-9570: code reformatting [record rev]. 2021-01-05 13:44:42 +01:00
.gitattributes LUCENE-9077: make git always keep .gradle files with LF EOLs. 2020-04-09 13:55:16 +02:00
.gitignore LUCENE-9920: Remove binary gradle-wrapper.jar from the repository 2021-04-10 16:08:39 +02:00
.hgignore LUCENE-2792: add FST impl 2010-12-12 15:36:08 +00:00
LICENSE.txt LUCENE-10163 Move LICENSE and NOTICE file to top level (#388) 2021-10-18 01:24:11 +02:00
NOTICE.txt LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379) 2021-10-19 09:45:49 +02:00
README.md Improve MIGRATE.md around analyzers artifacts. (#488) 2021-11-29 17:04:15 -05:00
build.gradle LUCENE-10240: gradle regenerate fails on java 17 (#449) 2021-11-17 18:36:34 +01:00
gradlew LUCENE-10198: remove debug statement that crept in. 2021-10-26 21:33:19 +02:00
gradlew.bat LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies) (#405) 2021-10-26 09:15:55 +02:00
settings.gradle LUCENE-9488: rewrite distribution assembly, signing and checksum generation (#372) 2021-10-13 11:50:58 +02:00
versions.lock remove unnecessary "dependencies" in versions.props (#526) 2021-12-07 21:22:54 -05:00
versions.props remove unnecessary "dependencies" in versions.props (#526) 2021-12-07 21:22:54 -05:00

README.md

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building with Gradle

Basic steps:

  1. Install OpenJDK 11 (or greater).
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

Step 0) Set up your development environment (OpenJDK 11 or greater)

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README. Lucene runs with Java 11 or later.

Lucene uses Gradle for build control. Gradle is itself Java-based and may be incompatible with newer Java versions; you can still build and test Lucene with these Java releases, see jvms.txt for more information.

NOTE: Lucene changed from Apache Ant to Gradle as of release 9.0. Prior releases still use Apache Ant.

Step 1) Checkout/Download Lucene source code

You can clone the source code from GitHub:

https://github.com/apache/lucene

or get Lucene source archives for a particular release from:

https://lucene.apache.org/core/downloads.html

Download the source archive and uncompress it into a directory of your choice.

Step 2) Run Gradle

Run "./gradlew help", this will show the main tasks that can be executed to show help sub-topics.

If you want to build Lucene, type:

./gradlew assemble

NOTE: DO NOT use the gradle command that is perhaps installed on your machine. This may result in using a different gradle version than the project requires and this is known to lead to very cryptic errors. The "gradle wrapper" (gradlew script) does everything required to build the project from scratch: it downloads the correct version of gradle, sets up sane local configurations and is tested on multiple environments.

The first time you run gradlew, it will create a file "gradle.properties" that contains machine-specific settings. Normally you can use this file as-is, but it can be modified if necessary.

./gradlew check will assemble Lucene and run all validation tasks (including tests).

./gradlew help will print a list of help guides that introduce and explain various parts of the build system, including typical workflow tasks.

If you want to just build the documentation, type:

./gradlew documentation

IDE support

  • IntelliJ - IntelliJ idea can import and build gradle-based projects out of the box.
  • Eclipse - Basic support (help/IDEs.txt).
  • Netbeans - Not tested.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support