We have a number of IntervalsSource implementations where automatic minimization of disjunctions can lead to surprising results: * PHRASE queries can miss matches because a longer matching sub-source is minimized away, leaving a gap * MAXGAPS queries can miss matches for the same reason * CONTAINING, NOT_CONTAINING, CONTAINED_BY and NOT_CONTAINED_BY queries can miss matches if the 'big' interval gets minimized The proper way to deal with this is to rewrite the queries by pulling disjunctions to the top of the query tree, so that PHRASE("a", OR(PHRASE("b", "c"), "c")) is rewritten to OR(PHRASE("a", "b", "c"), PHRASE("a", "c")). To be able to do this generally, we need to add a new pullUpDisjunctions() method to IntervalsSource that performs this rewriting for each source that it would apply to. Because these rewritten queries will in general be less efficient due to the duplication of effort (eg the rewritten PHRASE query above pulls 5 term iterators rather than 4 in the original), we also add an option to Intervals.or() that will prevent this happening, so that consumers can choose speed over accuracy if it suits their usecase.
Apache Lucene and Solr
Apache Lucene is a high-performance, full featured text search engine library written in Java.
Apache Solr is an enterprise search platform written using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.
Online Documentation
This README file only contains basic setup instructions. For more comprehensive documentation, visit:
Building Lucene/Solr
(You do not need to do this if you downloaded a pre-built package.)
Lucene and Solr are built using Apache Ant. To build Lucene and Solr, run:
ant compile
If you see an error about Ivy missing while invoking Ant (e.g., .ant/lib does not exist
), run ant ivy-bootstrap
and retry.
Sometimes you may face issues with Ivy (e.g., an incompletely downloaded artifact). Cleaning up the Ivy cache and retrying is a workaround for most of such issues:
rm -rf ~/.ivy2/cache
The Solr server can then be packaged and prepared for startup by running the
following command from the solr/
directory:
ant server
Running Solr
After building Solr, the server can be started using
the bin/solr
control scripts. Solr can be run in either standalone or
distributed (SolrCloud mode).
To run Solr in standalone mode, run the following command from the solr/
directory:
bin/solr start
To run Solr in SolrCloud mode, run the following command from the solr/
directory:
bin/solr start -c
The bin/solr
control script allows heavy modification of the started Solr.
Common options are described in some detail in solr/README.txt. For an
exhaustive treatment of options, run bin/solr start -h
from the solr/
directory.
Development/IDEs
Ant can be used to generate project files compatible with most common IDEs. Run the ant command corresponding to your IDE of choice before attempting to import Lucene/Solr.
- Eclipse -
ant eclipse
(See this for details) - IntelliJ -
ant idea
(See this for details) - Netbeans -
ant netbeans
(See this for details)
Running Tests
The standard test suite can be run with the command:
ant test
Like Solr itself, the test-running can be customized or tailored in a number or ways. For an exhaustive discussion of the options available, run:
ant test-help
Contributing
Please review the Contributing to Solr Guide for information on contributing.
Discussion and Support
- Users Mailing List
- Developers Mailing List
- Lucene Issue Tracker
- Solr Issue Tracker
- IRC:
#solr
and#solr-dev
on freenode.net