Commit Graph

2 Commits

Author SHA1 Message Date
Robert Muir c8f5b9127d
LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465)
* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones
2021-12-03 20:20:57 -05:00
Dawid Weiss f91700a713
LUCENE-9914: Modernize Emoji regeneration scripts (#78) 2021-04-12 20:16:43 +02:00