lucene

Commit Graph

Author	SHA1	Message	Date
Robert Muir	9d15435b15	LUCENE-9916: add a simple regeneration help doc (#73 ) Add a simple regeneration help doc Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>	2021-04-11 11:28:41 -04:00
Robert Muir	b0bd64c620	LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77 ) This adds a bit of simplicity as the file is a simple domain list, rather than a DNS zone. So the regexes parsing DNS can be removed. Also the file may change less often as it contains JUST the list of TLDs, and not any additional DNS metadata.	2021-04-11 11:25:15 -04:00
Robert Muir	15bfb28d7f	LUCENE-9922: checksum files should use a deterministic sort order (#75 ) This way the files don't unnecessarily change, depending on filesystem order or anything else.	2021-04-10 16:00:55 -04:00
Uwe Schindler	779e00542c	Make the character printout code uniform (always print at least 4 hex chars)	2021-04-08 16:38:31 +02:00
Dawid Weiss	4c2384a1f3	LUCENE-9872: load input/output checksums prior to executing the target task, even if regenerate is not called.	2021-04-08 15:00:20 +02:00
Dawid Weiss	39071dbc54	LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs	2021-04-07 10:56:21 +02:00
Dawid Weiss	fbf9191abf	LUCENE-9901: UnicodeData.java has no regeneration task (#63 )	2021-04-05 20:12:56 +02:00
Dawid Weiss	010e3a1ba9	LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61 )	2021-04-02 11:46:43 +02:00
Dawid Weiss	e3ae57a3c1	LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60 )	2021-04-02 09:56:47 +02:00
Dawid Weiss	39b8e97613	LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs	2021-03-30 14:38:13 +02:00
Dawid Weiss	3115797463	LUCENE-9871: clean up some old cruft and shuffle files around. Correct inputs/outputs on check broken links so that it's incremental.	2021-03-30 10:55:19 +02:00
Robert Muir	3596e05e5c	LUCENE-9878: enable redundantNullCheck in ecjLint (#44 ) Detects common cases of unreachable/dead code. For generated javacc code, the check is disabled via SuppressWarnings("unused") because javacc generates strange/bad code such as: if ("" == null) For TestStressNRTReplication's startNode() method, the check is also disabled because analysis folds the "test evilness controls" which are static final constants. This itself is a WTF, shouldn't we instead randomize these evil things in our tests rather than hardcoding them to specific values?	2021-03-27 11:43:47 -04:00
zacharymorn	3ed87c867a	LUCENE-9864: Enforce @Override annotation everywhere (#40 ) Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind. Co-authored-by: Robert Muir <rmuir@apache.org>	2021-03-25 17:50:38 -04:00
Dawid Weiss	a38713907d	LUCENE-9866: regenerate kuromoji dict in regenerate	2021-03-25 11:43:37 +01:00
Dawid Weiss	108cd85375	Avoid creating a circular dependency between shared subtasks.	2021-03-24 16:01:36 +01:00
Dawid Weiss	4c2de7ef43	Correct soft task ordering between tidy and any other dependency of regenerate.	2021-03-24 15:39:45 +01:00
Dawid Weiss	bb5db1e16d	Correct snowball download/unzip sequence to be always consistent.	2021-03-24 15:39:45 +01:00
Dawid Weiss	34f589b0aa	Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error).	2021-03-24 15:39:45 +01:00
Dawid Weiss	27510d5f2f	LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl.	2021-03-24 15:39:45 +01:00
Robert Muir	945b1cb872	LUCENE-9856: fail precommit on unused local variables, take two (#37 ) Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types. Co-authored-by: Mike McCandless <mikemccand@apache.org>	2021-03-23 13:59:00 -04:00
Robert Muir	e6c4956cf6	Revert "LUCENE-9856: fail precommit on unused local variables (#34 )" This reverts commit `20dba278bb`.	2021-03-23 12:46:36 -04:00
Robert Muir	20dba278bb	LUCENE-9856: fail precommit on unused local variables (#34 ) Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types. Co-authored-by: Mike McCandless <mikemccand@apache.org>	2021-03-23 11:09:24 -04:00
Dawid Weiss	53bea54669	LUCENE-9375: cleaning up post-split conditional build logic and solr refs. (#22 )	2021-03-18 11:04:45 +01:00
Dawid Weiss	fdf486ba54	LUCENE-9375: post-repo-split removal of solr counterpart.	2021-03-10 11:20:08 +01:00
Dawid Weiss	409bc37c13	SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place. (#2448 )	2021-03-08 14:59:08 +01:00
Dawid Weiss	224843a2ba	Clean up stale comments a bit.	2021-02-20 20:18:02 +01:00
Robert Muir	dd91f5ca82	LUCENE-9773: upgrade icu to 68.2 (#2372 ) Upgrade from icu 62.2 to 68.2, with Unicode 13 support. Modify GenerateUTR30DataFiles to take the release tag as a program argument. Gradle populates this automatically, removing a manual step from regeneration process.	2021-02-15 14:56:13 -05:00
Dawid Weiss	8f56ae0a4b	LUCENE-9767: infrastructure for icu regeneration in place. (#2362 )	2021-02-14 21:07:39 +01:00
Dawid Weiss	2cbf261032	LUCENE-9570: code reformatting [final].	2021-01-05 13:44:05 +01:00
Dawid Weiss	8ef6a0da56	LUCENE-9570: code reformatting [partial].	2020-12-28 12:26:13 +01:00
Dawid Weiss	2d6ad2fee6	LUCENE-9570: code reformatting [partial].	2020-12-23 12:41:23 +01:00
Przemek Bruski	ccf3e60453	LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception (#962 ) * LUCENE-9021 QueryParser: re-use the LookaheadSuccess exception Authored-by: Przemek Bruski <pbruski@atlassian.com>	2020-12-12 06:05:46 -08:00
Robert Muir	52f581e351	LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish (#2077 )	2020-11-14 09:27:08 -05:00
Robert Muir	7eee4fd102	LUCENE-9557: regeneration should use python3, not python2 python2 will change the DFA, but using python3 re-generates the sources as they exist today. plus, we don't want to depend on EOL python2.	2020-10-03 12:30:22 -04:00
Dawid Weiss	3ae0b50646	LUCENE-9546: Configure Nori and Kuromoji generation lazily when java plugin is applied to the projects	2020-09-29 10:24:17 +02:00
Namgyu Kim	00d7f5ea68	LUCENE-9544: Port Nori dictionary compilation (#1926 )	2020-09-28 20:28:21 +09:00
Dawid Weiss	6b0149ec1a	Revert "add regenerate gradle script for nori dictionary (#1924 )" This reverts commit `e28e8c0e0c`.	2020-09-28 09:52:34 +02:00
Tomoko Uchida	5e617ccc33	LUCENE-9317: Clean up split package in analyzers-common (#1836 )	2020-09-28 16:49:28 +09:00
Namgyu Kim	e28e8c0e0c	add regenerate gradle script for nori dictionary (#1924 )	2020-09-28 08:54:27 +02:00
Dawid Weiss	5ec2bac91c	LUCENE-9531: Consolidate duplicated generated classes CharStream and FastCharStream (#1886 )	2020-09-18 08:53:30 +02:00
Dawid Weiss	6c9d7adf79	LUCENE-9527: upgrade javacc to 7.0.4 (#1884 )	2020-09-17 13:29:18 +02:00
Dawid Weiss	4f344cb0d4	LUCENE-9530: cleaned up javacc gradle generation scripts. (#1883 ) * LUCENE-9530: cleaned up gradle javacc generation/ tweaks script so that it's consistent across runs. Removed ant remnants.	2020-09-17 10:53:02 +02:00
Dawid Weiss	d847f40237	LUCENE-9474: make externalTool a function and add a build-stopping message on Windows for snowball generator.	2020-08-30 17:10:18 +02:00
Uwe Schindler	494a8a8e04	LUCENE-9474: Make external tools configurable like in ant through those sysprops: perl.exe, python3.exe, python2.exe	2020-08-23 20:16:22 +02:00
Philippe Ouellet	7a849f6943	LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474 ) * Sync French stop words with latest version from Snowball. This new version removed some French homonyms from the list * Use latest master commit from snowball-website * LUCENE-9354: regenerate with 'gradle snowball * LUCENE-9354: add CHANGES.txt entry	2020-05-01 21:11:35 -04:00
Dawid Weiss	f8a2c39906	LUCENE-9155: add missing naist dictionary generation, clean up the code a bit.	2020-02-21 10:24:05 +01:00
Robert Muir	9302eee1e0	LUCENE-9235: upgrade all python to python3 Die, python2, die. Some generated .java files change (parameterized automata for spell-correction). This is because the order of python dictionaries was not well-defined previously. A sort() was added so that the python code now generates reproducible output (Thanks @mikemccand). So we'll suffer a change once, but the automata are equivalent. If you run the script again you should not see source code changes. The relevant unit tests are exhaustive (if you trust the paper!), so we can be confident it does not break things, even though it looks very scary.	2020-02-20 21:27:38 -05:00
Anshum Gupta	cb18586ea0	LUCENE-9155: Add Apache License header to the Kuromoji dictionary compilation (#1271 )	2020-02-20 14:59:06 -08:00
Dawid Weiss	62662e477a	LUCENE-9155: Port Kuromoji dictionary compilation (regenerate).	2020-02-20 19:00:56 +01:00
Robert Muir	b9a569e7be	LUCENE-9230: explicitly call python version we want from builds On newer linux distros, at least, 'python' now means python3. So we can't rely on what version of python it will invoke (at least for a few years). For example in Fedora Linux: https://fedoraproject.org/wiki/Changes/Python_means_Python3 For python2.x code, explicitly call 'python2.7' and for python3.x code, explicitly call 'python3'. Ant variable names are cleaned up, e.g. 'python.exe' is renamed to 'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also makes it easy to identify remaining python 2.x code that should be migrated to python 3.x	2020-02-18 18:58:17 -05:00
Robert Muir	ccb390d4a6	LUCENE-9220: prevent zip file reproducibility issues based on users umask	2020-02-17 13:34:00 -05:00
Robert Muir	0203815ab2	LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 ) Previous situation: * The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact. * Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc. * Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly). * Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied. * Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all. Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better. New situation: * Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy. * Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works. * Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them). * Stopword files are automatically regenerated from the commit hash of the snowball website repository. * The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.	2020-02-17 12:38:01 -05:00
Dawid Weiss	dcf448efeb	LUCENE-9134: Minor cleanups.	2020-02-13 11:18:01 +01:00
Erick Erickson	f9357ab0d2	LUCENE-9134: Port ant-regenerate tasks to Gradle build (util and packed) (#1251 ) * LUCENE-9134: Port ant-regenerate tasks to Gradle build	2020-02-11 18:56:11 -05:00
Erick Erickson	b0bb299dc4	LUCENE-9134: Port ant-regenerate tasks to Gradle build (#1230 ) LUCENE-9134: Port ant-regenerate tasks to Gradle build (Solr javacc)	2020-02-04 09:16:38 -05:00
Erick Erickson	5253c0cb74	LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226 ) LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226	2020-01-31 17:04:10 -05:00
Dawid Weiss	3a8ed5e8ed	LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter.	2020-01-30 13:48:16 +01:00
Dawid Weiss	e25dac085f	LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant).	2020-01-29 17:02:59 +01:00
Robert Muir	975df9ddd3	LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task	2020-01-27 12:05:34 -05:00
Dawid Weiss	6bde0f3ec8	LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS of memory and time... insane compared to the size of the input. None of my machines pass it without at least 12 gigs of heap (!).	2020-01-27 12:36:13 +01:00
Dawid Weiss	ae95f0ab68	LUCENE-9134: lucene:core:jflexStandardTokenizerImpl	2020-01-27 09:03:19 +01:00

1 2 3

111 Commits