Commit Graph

13 Commits

Author SHA1 Message Date
Dawid Weiss 8eb4eb2611
LUCENE-9909: add checksums of included files for some jflex generation tasks. Fix a task ordering issue with spotless. (#121)
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.

* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
2021-05-02 19:17:18 +02:00
Dawid Weiss beafd113de
LUCENE-9931: Rename checksummed regen. tasks FooInternal and generated wrappers Foo (#88) 2021-04-16 22:35:51 +02:00
Dawid Weiss 0b1d8ccba6
LUCENE-9925: add checksums to snowball-generated files (#80) 2021-04-13 08:59:31 +02:00
Dawid Weiss e3ae57a3c1
LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60) 2021-04-02 09:56:47 +02:00
Dawid Weiss 39b8e97613 LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs 2021-03-30 14:38:13 +02:00
Dawid Weiss bb5db1e16d Correct snowball download/unzip sequence to be always consistent. 2021-03-24 15:39:45 +01:00
Dawid Weiss 34f589b0aa Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error). 2021-03-24 15:39:45 +01:00
Dawid Weiss 27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Robert Muir 52f581e351
LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish (#2077) 2020-11-14 09:27:08 -05:00
Dawid Weiss d847f40237 LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator. 2020-08-30 17:10:18 +02:00
Uwe Schindler 494a8a8e04 LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe 2020-08-23 20:16:22 +02:00
Philippe Ouellet 7a849f6943
LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474)
* Sync French stop words with latest version from Snowball.

This new version removed some French homonyms from the list

* Use latest master commit from snowball-website

* LUCENE-9354: regenerate with 'gradle snowball

* LUCENE-9354: add CHANGES.txt entry
2020-05-01 21:11:35 -04:00
Robert Muir 0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00