mirror of https://github.com/apache/lucene.git
153 lines
6.1 KiB
Plaintext
153 lines
6.1 KiB
Plaintext
Regeneration
|
|
============
|
|
|
|
Lucene has a number of machine-generated resources - some of these are
|
|
resource (binary) files, others are Java source files that are stored
|
|
(and compiled) with the rest of Lucene source code.
|
|
|
|
If you're reading this, chances are that:
|
|
|
|
1) you've hit a precommit check error that said you've modified a generated
|
|
resource and some checksums are out of sync.
|
|
|
|
2) you need to regenerate one (or more) of these resources.
|
|
|
|
In many cases hitting (1) means you'll have to do (2) so let's discuss
|
|
these in order.
|
|
|
|
|
|
Checksum validation errors
|
|
--------------------------
|
|
|
|
LUCENE-9868 introduced a system of storing (and validating) checksums of
|
|
generated files so that they are not accidentally modified. This checkums
|
|
system will fail the build with a message similar to this one:
|
|
|
|
Execution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'.
|
|
> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged):
|
|
Actual:
|
|
lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326
|
|
|
|
Expected:
|
|
lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8
|
|
|
|
The message shows you which resources have mismatches on checksums (in this case
|
|
StandardTokenizerImpl.java) but also the *module* where the generated
|
|
resource exists and the *task name* that should be used to regenerate this resource:
|
|
|
|
:lucene:core:generateStandardTokenizerIfChanged
|
|
|
|
To resolve the problem, try to:
|
|
|
|
1) "git diff" the changes that caused the build failure (to see why the checksums
|
|
changed) and then decide whether to update the generated resource's template (or whatever
|
|
it is using to emit the generated resource);
|
|
|
|
2) regenerate the derived resources, possibly saving new checksums. If you decide to
|
|
regenerate, just run the task hinted at in the error message, for example:
|
|
|
|
gradlew :lucene:core:generateStandardTokenizerIfChanged
|
|
|
|
This regenerates all resources the task "generateStandardTokenizer" produces
|
|
and updates the corresponding checksums.
|
|
|
|
|
|
Resource regeneration
|
|
---------------------
|
|
|
|
The "convention" task for regenerating all derived resources in a given
|
|
module is called "regenerate" and you can apply it to all Lucene modules
|
|
by running:
|
|
|
|
gradlew regenerate
|
|
|
|
It is typically much wiser to limit the scope of regeneration to only
|
|
the module you're working with though:
|
|
|
|
gradlew -p lucene/analysis/common regenerate
|
|
|
|
If you're interested in what specific generation tasks are available, see
|
|
the task list for the generation group:
|
|
|
|
gradlew tasks --group generation
|
|
|
|
or limit the output to a particular module:
|
|
|
|
gradlew -p lucene/analysis/common tasks --group generation
|
|
|
|
which displays (at the moment of writing):
|
|
|
|
generateClassicTokenizer - Regenerate ClassicTokenizerImpl.java (if sources changed)
|
|
generateHTMLStripCharFilter - Regenerate HTMLStripCharFilter.java (if sources changed)
|
|
generateTlds - Regenerate top-level domain jflex macros and tests (if sources changed)
|
|
generateUAX29URLEmailTokenizer - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed)
|
|
generateWikipediaTokenizer - Regenerate WikipediaTokenizerImpl.java (if sources changed)
|
|
regenerate - Rerun any code or static data generation tasks.
|
|
snowball - Regenerates snowball stemmers.
|
|
|
|
You may wonder why none of these tasks actually exist in gradle source files (identically
|
|
named tasks with a suffix "Internal" exist).
|
|
|
|
|
|
Resource checksums, incremental generation and advanced topics
|
|
--------------------------------------------------------------
|
|
|
|
Many resource generation tasks require specific tools (perl, python, bash shell)
|
|
and resources that may not be available on all platforms. In LUCENE-9868 we tried
|
|
to make resource generation tasks "incremental" so that they only run if their
|
|
sources (or outputs) have changed. So if you run the generic "regenerate" task, many of the
|
|
actual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with
|
|
plain console, for example:
|
|
|
|
gradlew -p lucene/analysis/common regenerate --console=plain
|
|
|
|
...
|
|
> Task :lucene:analysis:common:generateUnicodeProps
|
|
Checksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodePropsInternal
|
|
...
|
|
|
|
This shouldn't worry you at all - the internal tasks are skipped by wrappers
|
|
if the inputs and outputs of the internal task have not changed. If they have changed,
|
|
the task is re-run and followed up by other tasks, such as code-formatting (tidy).
|
|
|
|
Of course, sometimes you may want to *force* the regeneration task to run, even if the
|
|
checksums indicate nothing has changed. This may happen because of several reasons:
|
|
|
|
- the generation task has outputs but no inputs or the inputs are volatile. In this case
|
|
only the outputs have checksums and the task will be skipped if the outputs haven't changed.
|
|
|
|
- you may want to run the regeneration task just to see that it actually runs and produces
|
|
the same checksums (git diff should be clean). This would be a wise periodic sanity check
|
|
to ensure everything works as expected.
|
|
|
|
If you want to force-run the regeneration, use gradle's "--rerun-tasks" option:
|
|
|
|
gradlew regenerate --rerun-tasks
|
|
|
|
Scoping the call to a particular module will also work:
|
|
|
|
gradlew -p lucene/analysis/common regenerate --rerun-tasks
|
|
|
|
Scoping the call to a particular task will also work:
|
|
|
|
gradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks
|
|
|
|
You *should not* call the underlying generation task directly; this is possible
|
|
but discouraged:
|
|
|
|
gradlew -p lucene/analysis/common generateUnicodePropsInternal --rerun-tasks
|
|
|
|
The reason is that some of these generation tasks require follow-up (for example
|
|
source code tidying) and, more importantly, the checksums for these
|
|
regenerated resources won't be saved (so the next time you run 'check' it'll fail
|
|
with checksum mismatches).
|
|
|
|
Finally, if you do feel like force-regenerating everything, remember to exclude this
|
|
monster...
|
|
|
|
gradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks
|
|
|
|
and on Windows, exclude snowball regeneration (requires bash):
|
|
|
|
gradlew regenerate -x generateUAX29URLEmailTokenizerInternal -x snowball --rerun-tasks
|