Apache Lucene open-source search software
Go to file
Matthias Osswald 3af0c6872a Add Option to Set Subtoken Position Increment for Dictonary Decompounder
This pull request adds a new feature to Lucene's DictionaryDecompounder. Now, you can set the position increment of subtokens to one. This feature is required when you're doing AND searches that involve subtokens.

Right now, the position increment is set to zero. That's how DictionaryDecompounder currently operates. But with this update, users can set the subtokenPositionIncrement to one. This changes the position increment of the subtokens to one. This means, if you're using the AND operator in Elasticsearch match clauses to search for 'orangenschokolade', and 'orangen' and 'schokolade' are in your dictionary, it will correctly search for 'orangen AND schokolade'.

By default, the DictionaryDecompounder emits the original compounded token. This behavior remains unchanged when the flag is set to zero. However, when set to one, it changes the DictionaryDecompounder's output to individual subtokens, and the original compounded token will not be emitted.
2023-07-31 15:50:39 +02:00
.github Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
buildSrc Implement MMapDirectory with Java 21 Project Panama Preview API (#12294) 2023-06-12 21:07:04 +02:00
dev-docs a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
dev-tools DOAP changes for release 9.7.0 2023-06-26 11:05:46 +02:00
gradle Enable search for site javadocs (#12430) 2023-07-24 10:38:19 -04:00
help Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
lucene Add Option to Set Subtoken Position Increment for Dictonary Decompounder 2023-07-31 15:50:39 +02:00
.asf.yaml .asf.yaml 2022-08-16 20:02:47 +09:00
.dir-locals.el LUCENE-9322: Add Lucene90 codec, including VectorFormat 2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs LUCENE-9570: code reformatting [record rev]. 2021-01-05 13:44:42 +01:00
.gitattributes LUCENE-10305: Ensure line endings of versions.props is LF 2021-12-11 10:10:44 +09:00
.gitignore LUCENE-9920: Remove binary gradle-wrapper.jar from the repository 2021-04-10 16:08:39 +02:00
.hgignore LUCENE-2792: add FST impl 2010-12-12 15:36:08 +00:00
.lift.toml Disable liftbot, we have our own tools 2022-05-05 22:27:57 +02:00
CONTRIBUTING.md Fix type in CONTRIBUTING.md (#11879) 2022-11-01 20:10:05 +00:00
LICENSE.txt LUCENE-10163 Move LICENSE and NOTICE file to top level (#388) 2021-10-18 01:24:11 +02:00
NOTICE.txt Cleanup NOTICE.txt (#12227) 2023-04-18 15:58:09 -04:00
README.md Allow building with java 18 now that gradle supports it (#11889) 2022-10-28 23:41:09 -04:00
build.gradle Integrate the Incubating Panama Vector API (#12311) 2023-05-25 07:59:50 +01:00
gradlew Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
gradlew.bat Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
settings.gradle GH-11172: remove WindowsDirectory and native subproject. (#11774) 2022-09-15 16:22:46 +02:00
versions.lock remove non-NRT replication support (#12038) 2023-01-14 11:14:46 -05:00
versions.props Upgrade to errorprone 2.18 (#12086) 2023-01-14 14:39:23 -05:00

README.md

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building

Basic steps:

  1. Install OpenJDK 17 or 18.
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

See Contributing Guide for details.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support