Commit Graph

16 Commits

Author SHA1 Message Date
Robert Muir b0bd64c620
LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77)
This adds a bit of simplicity as the file is a simple domain list,
rather than a DNS zone. So the regexes parsing DNS can be removed.

Also the file may change less often as it contains JUST the list of
TLDs, and not any additional DNS metadata.
2021-04-11 11:25:15 -04:00
Dawid Weiss 39071dbc54
LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs 2021-04-07 10:56:21 +02:00
Dawid Weiss 010e3a1ba9
LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61) 2021-04-02 11:46:43 +02:00
Dawid Weiss e3ae57a3c1
LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60) 2021-04-02 09:56:47 +02:00
Dawid Weiss 39b8e97613 LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs 2021-03-30 14:38:13 +02:00
Dawid Weiss 4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss 27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Dawid Weiss 8ef6a0da56 LUCENE-9570: code reformatting [partial]. 2020-12-28 12:26:13 +01:00
Robert Muir 7eee4fd102
LUCENE-9557: regeneration should use python3, not python2
python2 will change the DFA, but using python3 re-generates the sources
as they exist today. plus, we don't want to depend on EOL python2.
2020-10-03 12:30:22 -04:00
Tomoko Uchida 5e617ccc33
LUCENE-9317: Clean up split package in analyzers-common (#1836) 2020-09-28 16:49:28 +09:00
Dawid Weiss d847f40237 LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator. 2020-08-30 17:10:18 +02:00
Uwe Schindler 494a8a8e04 LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe 2020-08-23 20:16:22 +02:00
Dawid Weiss 3a8ed5e8ed LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter. 2020-01-30 13:48:16 +01:00
Robert Muir 975df9ddd3
LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task 2020-01-27 12:05:34 -05:00
Dawid Weiss 6bde0f3ec8 LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS
of memory and time... insane compared to the size of the input. None of my
machines pass it without at least 12 gigs of heap (!).
2020-01-27 12:36:13 +01:00
Dawid Weiss ae95f0ab68 LUCENE-9134: lucene:core:jflexStandardTokenizerImpl 2020-01-27 09:03:19 +01:00