Alan Woodward 7c1ba1aebe
LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals (#1097)
If you have repeating intervals in an ordered or unordered interval source, you currently 
get somewhat confusing behaviour:

* `ORDERED(a, a, b)` will return an extra interval over just a b if it first matches a a b, meaning
that you can get incorrect results if used in a `CONTAINING` filter - 
`CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y`
* `UNORDERED(a, a)` will match on documents that just containg a single a.

This commit adds a RepeatingIntervalsSource that correctly handles repeats within 
ordered and unordered sources. It also changes the way that gaps are calculated within 
ordered and unordered sources, by using a new width() method on IntervalIterator. The 
default implementation just returns end() - start() + 1, but RepeatingIntervalsSource 
instead returns the sum of the widths of its child iterators. This preserves maxgaps filtering 
on ordered and unordered sources that contain repeats.

In order to correctly handle matches in this scenario, IntervalsSource#matches now always 
returns an explicit IntervalsMatchesIterator rather than a plain MatchesIterator, which adds 
gaps() and width() methods so that submatches can be combined in the same way that 
subiterators are. Extra checks have been added to checkIntervals() to ensure that the same 
intervals are returned by both iterator and matches, and a fix to 
DisjunctionIntervalIterator#matches() is also included - DisjunctionIntervalIterator minimizes 
its intervals, while MatchesUtils.disjunction does not, so there was a discrepancy between 
the two methods.
2020-02-06 14:44:47 +00:00
2020-01-27 17:23:41 +01:00
2010-12-12 15:36:08 +00:00
2020-01-09 12:36:40 +01:00

Apache Lucene and Solr

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Apache Solr is an enterprise search platform written using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

Build Status Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building Lucene/Solr

(You do not need to do this if you downloaded a pre-built package)

Building with Ant

Lucene and Solr are built using Apache Ant. To build Lucene and Solr, run:

ant compile

If you see an error about Ivy missing while invoking Ant (e.g., .ant/lib does not exist), run ant ivy-bootstrap and retry.

Sometimes you may face issues with Ivy (e.g., an incompletely downloaded artifact). Cleaning up the Ivy cache and retrying is a workaround for most of such issues:

rm -rf ~/.ivy2/cache

The Solr server can then be packaged and prepared for startup by running the following command from the solr/ directory:

ant server

Building with Gradle

There is ongoing work (see LUCENE-9077) to switch the legacy ant-based build system to gradle. Please give it a try!

At the moment of writing, the gradle build requires precisely Java 11 (it may or may not work with newer Java versions).

To build Lucene and Solr, run (./ can be omitted on Windows):

./gradlew assemble

The command above also packages a full distribution of Solr server; the package can be located at:

solr/packaging/build/solr-*

Note that the gradle build does not create or copy binaries throughout the source repository (like ant build does) so you need to switch to the packaging output folder above; the rest of the instructions below remain identical.

Running Solr

After building Solr, the server can be started using the bin/solr control scripts. Solr can be run in either standalone or distributed (SolrCloud mode).

To run Solr in standalone mode, run the following command from the solr/ directory:

bin/solr start

To run Solr in SolrCloud mode, run the following command from the solr/ directory:

bin/solr start -c

The bin/solr control script allows heavy modification of the started Solr. Common options are described in some detail in solr/README.txt. For an exhaustive treatment of options, run bin/solr start -h from the solr/ directory.

Development/IDEs

Ant can be used to generate project files compatible with most common IDEs. Run the ant command corresponding to your IDE of choice before attempting to import Lucene/Solr.

  • Eclipse - ant eclipse (See this for details)
  • IntelliJ - ant idea (See this for details)
  • Netbeans - ant netbeans (See this for details)

Gradle build and IDE support

  • IntelliJ - IntelliJ idea can import the project out of the box. Code formatting conventions should be manually adjusted.
  • Eclipse - Not tested.
  • Netbeans - Not tested.

Running Tests

The standard test suite can be run with the command:

ant test

Like Solr itself, the test-running can be customized or tailored in a number or ways. For an exhaustive discussion of the options available, run:

ant test-help

Gradle build and tests

Run the following command to display an extensive help for running tests with gradle:

./gradlew helpTests

Contributing

Please review the Contributing to Solr Guide for information on contributing.

Discussion and Support

Description
Apache Lucene open-source search software
Readme 602 MiB
Languages
Java 97.7%
HTML 1%
Python 0.9%
Lex 0.3%