Commit Graph

23 Commits

Author SHA1 Message Date
Dawid Weiss b48cac0206
LUCENE-10285: try to force ordering of internal tasks, in spite of making top-level wrapper dependencies. (#549) 2021-12-17 19:12:09 +01:00
Robert Muir c8f5b9127d
LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465)
* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones
2021-12-03 20:20:57 -05:00
Robert Muir af831d2810
LUCENE-10239: upgrade jflex (1.7.0 -> 1.8.2) (#452)
Upgrade jflex.

Change doesn't alter the behavior of any of the analyzers (unicode
version or grammar refactorings), just the minimal to get new tooling
working.
2021-11-19 09:24:27 -05:00
Dawid Weiss 0cbafa4879 Fix gradle error hints. 2021-08-25 10:03:59 +02:00
Dawid Weiss 8eb4eb2611
LUCENE-9909: add checksums of included files for some jflex generation tasks. Fix a task ordering issue with spotless. (#121)
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.

* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
2021-05-02 19:17:18 +02:00
Dawid Weiss bd8f182b13
LUCENE-9933: Add non-file properties to wrapped regenerate checksums (#95) 2021-04-19 13:37:47 +02:00
Dawid Weiss beafd113de
LUCENE-9931: Rename checksummed regen. tasks FooInternal and generated wrappers Foo (#88) 2021-04-16 22:35:51 +02:00
Robert Muir b0bd64c620
LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77)
This adds a bit of simplicity as the file is a simple domain list,
rather than a DNS zone. So the regexes parsing DNS can be removed.

Also the file may change less often as it contains JUST the list of
TLDs, and not any additional DNS metadata.
2021-04-11 11:25:15 -04:00
Dawid Weiss 39071dbc54
LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs 2021-04-07 10:56:21 +02:00
Dawid Weiss 010e3a1ba9
LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61) 2021-04-02 11:46:43 +02:00
Dawid Weiss e3ae57a3c1
LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60) 2021-04-02 09:56:47 +02:00
Dawid Weiss 39b8e97613 LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs 2021-03-30 14:38:13 +02:00
Dawid Weiss 4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss 27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Dawid Weiss 8ef6a0da56 LUCENE-9570: code reformatting [partial]. 2020-12-28 12:26:13 +01:00
Robert Muir 7eee4fd102
LUCENE-9557: regeneration should use python3, not python2
python2 will change the DFA, but using python3 re-generates the sources
as they exist today. plus, we don't want to depend on EOL python2.
2020-10-03 12:30:22 -04:00
Tomoko Uchida 5e617ccc33
LUCENE-9317: Clean up split package in analyzers-common (#1836) 2020-09-28 16:49:28 +09:00
Dawid Weiss d847f40237 LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator. 2020-08-30 17:10:18 +02:00
Uwe Schindler 494a8a8e04 LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe 2020-08-23 20:16:22 +02:00
Dawid Weiss 3a8ed5e8ed LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter. 2020-01-30 13:48:16 +01:00
Robert Muir 975df9ddd3
LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task 2020-01-27 12:05:34 -05:00
Dawid Weiss 6bde0f3ec8 LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS
of memory and time... insane compared to the size of the input. None of my
machines pass it without at least 12 gigs of heap (!).
2020-01-27 12:36:13 +01:00
Dawid Weiss ae95f0ab68 LUCENE-9134: lucene:core:jflexStandardTokenizerImpl 2020-01-27 09:03:19 +01:00