98 Commits

Author SHA1 Message Date
Dawid Weiss
a38713907d LUCENE-9866: regenerate kuromoji dict in regenerate 2021-03-25 11:43:37 +01:00
Dawid Weiss
108cd85375 Avoid creating a circular dependency between shared subtasks. 2021-03-24 16:01:36 +01:00
Dawid Weiss
4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss
bb5db1e16d Correct snowball download/unzip sequence to be always consistent. 2021-03-24 15:39:45 +01:00
Dawid Weiss
34f589b0aa Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error). 2021-03-24 15:39:45 +01:00
Dawid Weiss
27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Robert Muir
945b1cb872
LUCENE-9856: fail precommit on unused local variables, take two (#37)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 13:59:00 -04:00
Robert Muir
e6c4956cf6
Revert "LUCENE-9856: fail precommit on unused local variables (#34)"
This reverts commit 20dba278bbfc4fec8b53c8371eae982e3fa24b39.
2021-03-23 12:46:36 -04:00
Robert Muir
20dba278bb
LUCENE-9856: fail precommit on unused local variables (#34)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 11:09:24 -04:00
Dawid Weiss
53bea54669
LUCENE-9375: cleaning up post-split conditional build logic and solr refs. (#22) 2021-03-18 11:04:45 +01:00
Dawid Weiss
fdf486ba54 LUCENE-9375: post-repo-split removal of solr counterpart. 2021-03-10 11:20:08 +01:00
Dawid Weiss
409bc37c13
SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place. (#2448) 2021-03-08 14:59:08 +01:00
Dawid Weiss
224843a2ba Clean up stale comments a bit. 2021-02-20 20:18:02 +01:00
Robert Muir
dd91f5ca82
LUCENE-9773: upgrade icu to 68.2 (#2372)
Upgrade from icu 62.2 to 68.2, with Unicode 13 support.

Modify GenerateUTR30DataFiles to take the release tag as a program
argument. Gradle populates this automatically, removing a manual step
from regeneration process.
2021-02-15 14:56:13 -05:00
Dawid Weiss
8f56ae0a4b
LUCENE-9767: infrastructure for icu regeneration in place. (#2362) 2021-02-14 21:07:39 +01:00
Dawid Weiss
2cbf261032 LUCENE-9570: code reformatting [final]. 2021-01-05 13:44:05 +01:00
Dawid Weiss
8ef6a0da56 LUCENE-9570: code reformatting [partial]. 2020-12-28 12:26:13 +01:00
Dawid Weiss
2d6ad2fee6 LUCENE-9570: code reformatting [partial]. 2020-12-23 12:41:23 +01:00
Przemek Bruski
ccf3e60453
LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception (#962)
* LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception
Authored-by: Przemek Bruski <pbruski@atlassian.com>
2020-12-12 06:05:46 -08:00
Robert Muir
52f581e351
LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish (#2077) 2020-11-14 09:27:08 -05:00
Robert Muir
7eee4fd102
LUCENE-9557: regeneration should use python3, not python2
python2 will change the DFA, but using python3 re-generates the sources
as they exist today. plus, we don't want to depend on EOL python2.
2020-10-03 12:30:22 -04:00
Dawid Weiss
3ae0b50646 LUCENE-9546: Configure Nori and Kuromoji generation lazily when java plugin is applied to the projects 2020-09-29 10:24:17 +02:00
Namgyu Kim
00d7f5ea68
LUCENE-9544: Port Nori dictionary compilation (#1926) 2020-09-28 20:28:21 +09:00
Dawid Weiss
6b0149ec1a Revert "add regenerate gradle script for nori dictionary (#1924)"
This reverts commit e28e8c0e0c87cfdbc7c5df237fe93d2351fce857.
2020-09-28 09:52:34 +02:00
Tomoko Uchida
5e617ccc33
LUCENE-9317: Clean up split package in analyzers-common (#1836) 2020-09-28 16:49:28 +09:00
Namgyu Kim
e28e8c0e0c
add regenerate gradle script for nori dictionary (#1924) 2020-09-28 08:54:27 +02:00
Dawid Weiss
5ec2bac91c
LUCENE-9531: Consolidate duplicated generated classes CharStream and FastCharStream (#1886) 2020-09-18 08:53:30 +02:00
Dawid Weiss
6c9d7adf79
LUCENE-9527: upgrade javacc to 7.0.4 (#1884) 2020-09-17 13:29:18 +02:00
Dawid Weiss
4f344cb0d4
LUCENE-9530: cleaned up javacc gradle generation scripts. (#1883)
* LUCENE-9530: cleaned up gradle javacc generation/ tweaks script so that it's consistent across runs. Removed ant remnants.
2020-09-17 10:53:02 +02:00
Dawid Weiss
d847f40237 LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator. 2020-08-30 17:10:18 +02:00
Uwe Schindler
494a8a8e04 LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe 2020-08-23 20:16:22 +02:00
Philippe Ouellet
7a849f6943
LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474)
* Sync French stop words with latest version from Snowball.

This new version removed some French homonyms from the list

* Use latest master commit from snowball-website

* LUCENE-9354: regenerate with 'gradle snowball

* LUCENE-9354: add CHANGES.txt entry
2020-05-01 21:11:35 -04:00
Dawid Weiss
f8a2c39906 LUCENE-9155: add missing naist dictionary generation, clean up the code a bit. 2020-02-21 10:24:05 +01:00
Robert Muir
9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00
Anshum Gupta
cb18586ea0
LUCENE-9155: Add Apache License header to the Kuromoji dictionary compilation (#1271) 2020-02-20 14:59:06 -08:00
Dawid Weiss
62662e477a LUCENE-9155: Port Kuromoji dictionary compilation (regenerate). 2020-02-20 19:00:56 +01:00
Robert Muir
b9a569e7be
LUCENE-9230: explicitly call python version we want from builds
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).

For example in Fedora Linux:

https://fedoraproject.org/wiki/Changes/Python_means_Python3

For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.

Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
2020-02-18 18:58:17 -05:00
Robert Muir
ccb390d4a6
LUCENE-9220: prevent zip file reproducibility issues based on users umask 2020-02-17 13:34:00 -05:00
Robert Muir
0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00
Dawid Weiss
dcf448efeb LUCENE-9134: Minor cleanups. 2020-02-13 11:18:01 +01:00
Erick Erickson
f9357ab0d2
LUCENE-9134: Port ant-regenerate tasks to Gradle build (util and packed) (#1251)
* LUCENE-9134: Port ant-regenerate tasks to Gradle build
2020-02-11 18:56:11 -05:00
Erick Erickson
b0bb299dc4
LUCENE-9134: Port ant-regenerate tasks to Gradle build (#1230)
LUCENE-9134: Port ant-regenerate tasks to Gradle build (Solr javacc)
2020-02-04 09:16:38 -05:00
Erick Erickson
5253c0cb74
LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226)
LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226
2020-01-31 17:04:10 -05:00
Dawid Weiss
3a8ed5e8ed LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter. 2020-01-30 13:48:16 +01:00
Dawid Weiss
e25dac085f LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant). 2020-01-29 17:02:59 +01:00
Robert Muir
975df9ddd3
LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task 2020-01-27 12:05:34 -05:00
Dawid Weiss
6bde0f3ec8 LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS
of memory and time... insane compared to the size of the input. None of my
machines pass it without at least 12 gigs of heap (!).
2020-01-27 12:36:13 +01:00
Dawid Weiss
ae95f0ab68 LUCENE-9134: lucene:core:jflexStandardTokenizerImpl 2020-01-27 09:03:19 +01:00