mirror of https://github.com/apache/lucene.git synced 2025-02-11 04:25:40 +00:00

Go to file

Robert Muir 0203815ab2

LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 )

Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.

2020-02-17 12:38:01 -05:00

.github

LUCENE-9146: Create gradle precommit action (#1245 )

2020-02-07 15:10:09 -08:00

buildSrc

LUCENE-9193: heap allocations for tests.profile

2020-01-30 08:29:10 -05:00

dev-docs

Revert "SOLR-12930: move Gradle docs from ./help/ to new ./dev-docs/ directory"

2020-01-24 15:56:00 -06:00

dev-tools

Add HTML output format to gitHubPRs.py

2020-02-15 02:06:16 +01:00

gradle

LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 )

2020-02-17 12:38:01 -05:00

help

LUCENE-9193: fix documentation typo for gradle tests

2020-01-30 23:54:31 -05:00

lucene

LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 )

2020-02-17 12:38:01 -05:00

solr

LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 )

2020-02-17 12:38:01 -05:00

.asf.yaml

LUCENE-8986: labels: remove dubious 'sql', add 'information-retrieval'

2019-10-28 10:19:13 -04:00

.gitattributes

LUCENE-9180: dos2unix files that don't need dos line endings

2020-01-27 11:29:59 -05:00

.gitignore

Generate hardware-specific defaults for gradle parallelism on the first build run (any task). Add some explanations on how to tweak local settings even further (gradlew :helpLocalSettings

2019-12-05 11:14:09 +01:00

.hgignore

LUCENE-2792: add FST impl

2010-12-12 15:36:08 +00:00

build.gradle

LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 )

2020-02-17 12:38:01 -05:00

build.xml

SOLR-14243: ant clean-jars should not delete gradle-wrapper.jar.

2020-02-10 16:12:19 +01:00

gradlew

Upgrade gradlew. Add environment sanity check.

2019-12-16 15:23:06 +01:00

gradlew.bat

Upgrade gradlew. Add environment sanity check.

2019-12-16 15:23:06 +01:00

README.md

Add gradle-relevant readme sections.

2020-01-09 12:36:40 +01:00

settings.gradle

LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task

2020-01-27 12:05:34 -05:00

versions.lock

SOLR-14221: Upgrade restlet to version 2.4.0 (#1211 )

2020-02-02 11:35:14 +01:00

versions.props

SOLR-14221: Upgrade restlet to version 2.4.0 (#1211 )

2020-02-02 11:35:14 +01:00

README.md

Apache Lucene and Solr

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Apache Solr is an enterprise search platform written using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Lucene: http://lucene.apache.org/core/documentation.html
Solr: http://lucene.apache.org/solr/guide/

Building Lucene/Solr

(You do not need to do this if you downloaded a pre-built package)

Building with Ant

Lucene and Solr are built using Apache Ant. To build Lucene and Solr, run:

ant compile

If you see an error about Ivy missing while invoking Ant (e.g., .ant/lib does not exist), run ant ivy-bootstrap and retry.

Sometimes you may face issues with Ivy (e.g., an incompletely downloaded artifact). Cleaning up the Ivy cache and retrying is a workaround for most of such issues:

rm -rf ~/.ivy2/cache

The Solr server can then be packaged and prepared for startup by running the following command from the solr/ directory:

ant server

Building with Gradle

There is ongoing work (see LUCENE-9077) to switch the legacy ant-based build system to gradle. Please give it a try!

At the moment of writing, the gradle build requires precisely Java 11 (it may or may not work with newer Java versions).

To build Lucene and Solr, run (./ can be omitted on Windows):

./gradlew assemble

The command above also packages a full distribution of Solr server; the package can be located at:

solr/packaging/build/solr-*

Note that the gradle build does not create or copy binaries throughout the source repository (like ant build does) so you need to switch to the packaging output folder above; the rest of the instructions below remain identical.

Running Solr

After building Solr, the server can be started using the bin/solr control scripts. Solr can be run in either standalone or distributed (SolrCloud mode).

To run Solr in standalone mode, run the following command from the solr/ directory:

bin/solr start

To run Solr in SolrCloud mode, run the following command from the solr/ directory:

bin/solr start -c

The bin/solr control script allows heavy modification of the started Solr. Common options are described in some detail in solr/README.txt. For an exhaustive treatment of options, run bin/solr start -h from the solr/ directory.

Development/IDEs

Ant can be used to generate project files compatible with most common IDEs. Run the ant command corresponding to your IDE of choice before attempting to import Lucene/Solr.

Eclipse - ant eclipse (See this for details)
IntelliJ - ant idea (See this for details)
Netbeans - ant netbeans (See this for details)

Gradle build and IDE support

IntelliJ - IntelliJ idea can import the project out of the box. Code formatting conventions should be manually adjusted.
Eclipse - Not tested.
Netbeans - Not tested.

Running Tests

The standard test suite can be run with the command:

ant test

Like Solr itself, the test-running can be customized or tailored in a number or ways. For an exhaustive discussion of the options available, run:

ant test-help

Gradle build and tests

Run the following command to display an extensive help for running tests with gradle:

./gradlew helpTests

Contributing

Please review the Contributing to Solr Guide for information on contributing.

Discussion and Support

Users Mailing List
Developers Mailing List
Lucene Issue Tracker
Solr Issue Tracker
IRC: #solr and #solr-dev on freenode.net