Apache Lucene open-source search software
Go to file
Robert Muir be94a667f2
LUCENE-9827: avoid wasteful recompression for small segments (#28)
Require that the segment has enough dirty documents to create a clean
chunk before recompressing during merge, there must be at least maxChunkSize.

This prevents wasteful recompression with small flushes (e.g. every
document): we ensure recompression achieves some "permanent" progress.

Expose maxDocsPerChunk as a parameter for Term vectors too, matching the
stored fields format. This allows for easy testing.

Increment numDirtyDocs for partially optimized merges:
If segment N needs recompression, we have to flush any buffered docs
before bulk-copying segment N+1. Don't just increment numDirtyChunks,
also make sure numDirtyDocs is incremented, too.
This doesn't have a performance impact, and is unrelated to tooDirty()
improvements, but it is easier to reason about things with correct
statistics in the index.

Further tuning of how dirtiness is measured: for simplification just use percentage
of dirty chunks.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-04-06 14:18:48 -04:00
.github Cleanup readme file, doaps and copy build instructions from lucene subfolder (#6) 2021-03-10 16:10:06 +01:00
.muse SOLR-14883 Add a Muse (Continuous assurance platform) configuration (#1901) 2020-09-23 17:42:19 -07:00
buildSrc LUCENE-9901: UnicodeData.java has no regeneration task (#63) 2021-04-05 20:12:56 +02:00
dev-docs SOLR-15160 update cloud.sh (#2393) 2021-02-21 14:36:19 -05:00
dev-tools LUCENE-9871: clean up some old cruft and shuffle files around. Correct inputs/outputs on check broken links so that it's incremental. 2021-03-30 10:55:19 +02:00
gradle LUCENE-9901: UnicodeData.java has no regeneration task (#63) 2021-04-05 20:12:56 +02:00
help LUCENE-9854: Clean up utilities to download and extract test/ benchmark data sets. (#27) 2021-03-22 12:22:39 +01:00
lucene LUCENE-9827: avoid wasteful recompression for small segments (#28) 2021-04-06 14:18:48 -04:00
.asf.yaml titles for github 2021-03-10 12:51:06 +01:00
.dir-locals.el LUCENE-9322: Add Lucene90 codec, including VectorFormat 2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs LUCENE-9570: code reformatting [record rev]. 2021-01-05 13:44:42 +01:00
.gitattributes LUCENE-9077: make git always keep .gradle files with LF EOLs. 2020-04-09 13:55:16 +02:00
.gitignore Ignore sdkmanrc file on Git (#58) 2021-04-02 01:04:14 +09:00
.hgignore LUCENE-2792: add FST impl 2010-12-12 15:36:08 +00:00
LICENSE LUCENE-9233 Add top level LICENSE file 2020-02-20 20:53:57 +01:00
README.md Point jdk.java.net instead of OracleJDK page. (#42) 2021-03-26 08:37:52 +09:00
build.gradle LUCENE-9901: UnicodeData.java has no regeneration task (#63) 2021-04-05 20:12:56 +02:00
gradlew Gradle hotfix in preparation for Jenkins: Fix for whitespace in directory violations 2020-08-23 17:51:11 +02:00
gradlew.bat Gradle hotfix in preparation for Jenkins: Fix for whitespace in directory violations 2020-08-23 17:51:11 +02:00
settings.gradle LUCENE-9871: clean up some old cruft and shuffle files around. Correct inputs/outputs on check broken links so that it's incremental. 2021-03-30 10:55:19 +02:00
versions.lock SOLR-15002 Upgrade httpclient to 4.5.13 and httpcore to 4.4.13 (#14) 2021-03-17 22:25:42 -04:00
versions.props SOLR-15002 Upgrade httpclient to 4.5.13 and httpcore to 4.4.13 (#14) 2021-03-17 22:25:42 -04:00

README.md

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building with Gradle

Basic steps:

  1. Install OpenJDK 11 (or greater)
  2. Download Lucene from Apache and unpack it (or clone the git repository).
  3. Run gradle launcher script (gradlew).

Step 0) Set up your development environment (OpenJDK 11 or greater)

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README. Lucene runs with Java 11 or later.

Lucene uses Gradle for build control.

NOTE: Lucene changed from Ant to Gradle as of release 9.0. Prior releases still use Ant.

Step 1) Checkout/Download Lucene source code

You can clone the source code from GitHub:

https://github.com/apache/lucene

or get Lucene source archives for a particular release from:

https://lucene.apache.org/core/downloads.html

Download either a zip or a tarred/gzipped version of the archive, and uncompress it into a directory of your choice.

Step 2) Run Gradle

Run "./gradlew help", this will show the main tasks that can be executed to show help sub-topics.

If you want to build Lucene, type:

./gradlew assemble

NOTE: DO NOT use gradle command that is already installed on your machine (unless you know what you'll do). The "gradle wrapper" (gradlew) does the job - downloads the correct version of it, setups necessary configurations.

The first time you run Gradle, it will create a file "gradle.properties" that contains machine-specific settings. Normally you can use this file as-is, but it can be modified if necessary.

./gradlew check will assemble Lucene and run all validation tasks (including unit tests).

./gradlew help will print a list of help guides that help understand how the build and typical workflow works.

If you want to build the documentation, type:

./gradlew documentation

Gradle build and IDE support

  • IntelliJ - IntelliJ idea can import the project out of the box.
  • Eclipse - Basic support (help/IDEs.txt).
  • Netbeans - Not tested.

Contributing

Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support