Commit Graph

64 Commits

Author SHA1 Message Date
Dawid Weiss 0b1d8ccba6
LUCENE-9925: add checksums to snowball-generated files (#80) 2021-04-13 08:59:31 +02:00
Dawid Weiss 3f3917d504 LUCENE-9914: remove stale file. 2021-04-12 20:19:14 +02:00
Dawid Weiss f91700a713
LUCENE-9914: Modernize Emoji regeneration scripts (#78) 2021-04-12 20:16:43 +02:00
Robert Muir 9d15435b15
LUCENE-9916: add a simple regeneration help doc (#73)
Add a simple regeneration help doc

Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :)

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2021-04-11 11:28:41 -04:00
Robert Muir b0bd64c620
LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77)
This adds a bit of simplicity as the file is a simple domain list,
rather than a DNS zone. So the regexes parsing DNS can be removed.

Also the file may change less often as it contains JUST the list of
TLDs, and not any additional DNS metadata.
2021-04-11 11:25:15 -04:00
Robert Muir 15bfb28d7f
LUCENE-9922: checksum files should use a deterministic sort order (#75)
This way the files don't unnecessarily change, depending on filesystem
order or anything else.
2021-04-10 16:00:55 -04:00
Uwe Schindler 779e00542c Make the character printout code uniform (always print at least 4 hex chars) 2021-04-08 16:38:31 +02:00
Dawid Weiss 4c2384a1f3 LUCENE-9872: load input/output checksums prior to executing the target task, even if regenerate is not called. 2021-04-08 15:00:20 +02:00
Dawid Weiss 39071dbc54
LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs 2021-04-07 10:56:21 +02:00
Dawid Weiss fbf9191abf
LUCENE-9901: UnicodeData.java has no regeneration task (#63) 2021-04-05 20:12:56 +02:00
Dawid Weiss 010e3a1ba9
LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61) 2021-04-02 11:46:43 +02:00
Dawid Weiss e3ae57a3c1
LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60) 2021-04-02 09:56:47 +02:00
Dawid Weiss 39b8e97613 LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs 2021-03-30 14:38:13 +02:00
Dawid Weiss 3115797463 LUCENE-9871: clean up some old cruft and shuffle files around. Correct inputs/outputs on check broken links so that it's incremental. 2021-03-30 10:55:19 +02:00
Robert Muir 3596e05e5c
LUCENE-9878: enable redundantNullCheck in ecjLint (#44)
Detects common cases of unreachable/dead code.

For generated javacc code, the check is disabled via
SuppressWarnings("unused") because javacc generates strange/bad code such as:

  if ("" == null)

For TestStressNRTReplication's startNode() method, the check is also
disabled because analysis folds the "test evilness controls" which are
static final constants. This itself is a WTF, shouldn't we instead
randomize these evil things in our tests rather than hardcoding them to
specific values?
2021-03-27 11:43:47 -04:00
zacharymorn 3ed87c867a
LUCENE-9864: Enforce @Override annotation everywhere (#40)
Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind.

Co-authored-by: Robert Muir <rmuir@apache.org>
2021-03-25 17:50:38 -04:00
Dawid Weiss a38713907d LUCENE-9866: regenerate kuromoji dict in regenerate 2021-03-25 11:43:37 +01:00
Dawid Weiss 108cd85375 Avoid creating a circular dependency between shared subtasks. 2021-03-24 16:01:36 +01:00
Dawid Weiss 4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss bb5db1e16d Correct snowball download/unzip sequence to be always consistent. 2021-03-24 15:39:45 +01:00
Dawid Weiss 34f589b0aa Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error). 2021-03-24 15:39:45 +01:00
Dawid Weiss 27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Robert Muir 945b1cb872
LUCENE-9856: fail precommit on unused local variables, take two (#37)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 13:59:00 -04:00
Robert Muir e6c4956cf6
Revert "LUCENE-9856: fail precommit on unused local variables (#34)"
This reverts commit 20dba278bb.
2021-03-23 12:46:36 -04:00
Robert Muir 20dba278bb
LUCENE-9856: fail precommit on unused local variables (#34)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 11:09:24 -04:00
Dawid Weiss 53bea54669
LUCENE-9375: cleaning up post-split conditional build logic and solr refs. (#22) 2021-03-18 11:04:45 +01:00
Dawid Weiss fdf486ba54 LUCENE-9375: post-repo-split removal of solr counterpart. 2021-03-10 11:20:08 +01:00
Dawid Weiss 409bc37c13
SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place. (#2448) 2021-03-08 14:59:08 +01:00
Dawid Weiss 224843a2ba Clean up stale comments a bit. 2021-02-20 20:18:02 +01:00
Robert Muir dd91f5ca82
LUCENE-9773: upgrade icu to 68.2 (#2372)
Upgrade from icu 62.2 to 68.2, with Unicode 13 support.

Modify GenerateUTR30DataFiles to take the release tag as a program
argument. Gradle populates this automatically, removing a manual step
from regeneration process.
2021-02-15 14:56:13 -05:00
Dawid Weiss 8f56ae0a4b
LUCENE-9767: infrastructure for icu regeneration in place. (#2362) 2021-02-14 21:07:39 +01:00
Dawid Weiss 2cbf261032 LUCENE-9570: code reformatting [final]. 2021-01-05 13:44:05 +01:00
Dawid Weiss 8ef6a0da56 LUCENE-9570: code reformatting [partial]. 2020-12-28 12:26:13 +01:00
Dawid Weiss 2d6ad2fee6 LUCENE-9570: code reformatting [partial]. 2020-12-23 12:41:23 +01:00
Przemek Bruski ccf3e60453
LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception (#962)
* LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception
Authored-by: Przemek Bruski <pbruski@atlassian.com>
2020-12-12 06:05:46 -08:00
Robert Muir 52f581e351
LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish (#2077) 2020-11-14 09:27:08 -05:00
Robert Muir 7eee4fd102
LUCENE-9557: regeneration should use python3, not python2
python2 will change the DFA, but using python3 re-generates the sources
as they exist today. plus, we don't want to depend on EOL python2.
2020-10-03 12:30:22 -04:00
Dawid Weiss 3ae0b50646 LUCENE-9546: Configure Nori and Kuromoji generation lazily when java plugin is applied to the projects 2020-09-29 10:24:17 +02:00
Namgyu Kim 00d7f5ea68
LUCENE-9544: Port Nori dictionary compilation (#1926) 2020-09-28 20:28:21 +09:00
Dawid Weiss 6b0149ec1a Revert "add regenerate gradle script for nori dictionary (#1924)"
This reverts commit e28e8c0e0c.
2020-09-28 09:52:34 +02:00
Tomoko Uchida 5e617ccc33
LUCENE-9317: Clean up split package in analyzers-common (#1836) 2020-09-28 16:49:28 +09:00
Namgyu Kim e28e8c0e0c
add regenerate gradle script for nori dictionary (#1924) 2020-09-28 08:54:27 +02:00
Dawid Weiss 5ec2bac91c
LUCENE-9531: Consolidate duplicated generated classes CharStream and FastCharStream (#1886) 2020-09-18 08:53:30 +02:00
Dawid Weiss 6c9d7adf79
LUCENE-9527: upgrade javacc to 7.0.4 (#1884) 2020-09-17 13:29:18 +02:00
Dawid Weiss 4f344cb0d4
LUCENE-9530: cleaned up javacc gradle generation scripts. (#1883)
* LUCENE-9530: cleaned up gradle javacc generation/ tweaks script so that it's consistent across runs. Removed ant remnants.
2020-09-17 10:53:02 +02:00
Dawid Weiss d847f40237 LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator. 2020-08-30 17:10:18 +02:00
Uwe Schindler 494a8a8e04 LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe 2020-08-23 20:16:22 +02:00
Philippe Ouellet 7a849f6943
LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474)
* Sync French stop words with latest version from Snowball.

This new version removed some French homonyms from the list

* Use latest master commit from snowball-website

* LUCENE-9354: regenerate with 'gradle snowball

* LUCENE-9354: add CHANGES.txt entry
2020-05-01 21:11:35 -04:00
Dawid Weiss f8a2c39906 LUCENE-9155: add missing naist dictionary generation, clean up the code a bit. 2020-02-21 10:24:05 +01:00
Robert Muir 9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00