Add a simple regeneration help doc
Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
This adds a bit of simplicity as the file is a simple domain list,
rather than a DNS zone. So the regexes parsing DNS can be removed.
Also the file may change less often as it contains JUST the list of
TLDs, and not any additional DNS metadata.
Fails the linter if an exception is swallowed (e.g. variable completely
unused).
If this is intentional for some reason, the exception can simply by
annotated with @SuppressWarnings("unused").
This enables quite a few javac warnings from java11+ that weren't
enabled for some reason. None of them fail, so lock them in.
Additionally some newer checks are only recognized for newer JDK
versions, so they are only enabled based on the javac version used. They
will cause no annoyance because they relate to newer language features.
Detects common cases of unreachable/dead code.
For generated javacc code, the check is disabled via
SuppressWarnings("unused") because javacc generates strange/bad code such as:
if ("" == null)
For TestStressNRTReplication's startNode() method, the check is also
disabled because analysis folds the "test evilness controls" which are
static final constants. This itself is a WTF, shouldn't we instead
randomize these evil things in our tests rather than hardcoding them to
specific values?
Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind.
Co-authored-by: Robert Muir <rmuir@apache.org>
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.
Co-authored-by: Mike McCandless <mikemccand@apache.org>
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.
Co-authored-by: Mike McCandless <mikemccand@apache.org>
The profiler should only be invoked once at the end of the build. During
refactoring the buildFinished() hook became nested underneath stuff such
as allProjects which causes it to run too many times.
* Updated SOLR-8138 files for Solr 9.
This code was mostly written by Michael Suzuki, i just tweaked it to load, and updated the version of ui-grid to the 4.10 version.
* unused file, we use the .min version.
* add an entry for the ui-grid project to license file.
Co-authored-by: epugh@opensourceconnections.com <>
Upgrade from icu 62.2 to 68.2, with Unicode 13 support.
Modify GenerateUTR30DataFiles to take the release tag as a program
argument. Gradle populates this automatically, removing a manual step
from regeneration process.
* Creating Scripting contrib module to centralize the less secure code related to scripts.
* tweak the changelog and update notice to explain why the name changed and the security posture thinking
* the test script happens to be a currency.xml, which made me think we were doing something specific to currency types, but instead any xml formatted file will suffice for the test.
* Update solr/contrib/scripting/src/java/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.java
* Update solr/contrib/scripting/src/java/org/apache/solr/scripting/update/package-info.java
* drop the ing, and be more specific on the name of the ref guide page
* comment out the script update chain.
The sample techproducts configSet is used by many of the solr unit tests, and by default doesn't have access to the jar file in the contrib module. This is commented out, similar to how the lang contrib is.
* using a Mock for the script processor in order to keep the trusted configSets tests all together.
* tweak since we are using a mock script processor
Co-authored-by: David Smiley <dsmiley@apache.org>
When we are creating a new thread we should give it a descriptive name and enforce this via ForbiddenAPIs. This doesn't apply to Runnable or Callable objects that we pass to an executor, since those should be getting named by the executor itself.
We don't require this in tests because the tests should be more self contained and there is less benefit in descriptive names. If somebody is already profiling a test, then they likely have the context to understand what the unnamed threads are doing, whereas a thread dump from a running Solr instance should have good thread names for everything. This is especially helpful when doing profiling, otherwise we end up with a bunch of Thread-# that are hard to tell apart and search on.
* Removed docker plugin from gradle builds.
* Removed package docker image.
* Tasks now have correct inputs/outputs/dependencies.
* Move gradle help text to docker folder.
* Reduce duplicated Docker layer by doing file removal and chmod in another stage.
Co-authored-by: David Smiley <dsmiley@apache.org>
* Creating Scripting contrib module to centralize the less secure code related to scripts.
* tweak the changelog and update notice to explain why the name changed and the security posture thinking
* the test script happens to be a currency.xml, which made me think we were doing something specific to currency types, but instead any xml formatted file will suffice for the test.
* drop the ing, and be more specific on the name of the ref guide page
* use the same name everywhere
Co-authored-by: David Smiley <dsmiley@apache.org>
* Reduced dependencies from Solr server down to just SolrJ. Don't add WEB-INF/lib.
* Was missing some dependencies in lib/; now has all except SolrJ & logging.
* Can run via gradle, "gradlew run"
* Has own log4j2.xml now
Has own CHANGES.md now.
* Build Lucene binary distribution using Gradle
* Generate SHA-512 checksums for all release artifacts
* Update documentation artifacts included in binaries
* Delete some additional Ant relics
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
This has the same logic as the previous python, but no longer relies
upon parsing HTML output, instead using java's doclet processor.
The errors are reported like "normal" javadoc errors with source file
name and line number and happen when running "gradlew javadoc"
Although the "rules" are the same as the previous python, the python had
some bugs where the checker didn't quite do exactly what we wanted, so
some fixes were applied throughout.
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
* Remove DIH example directory
* Remove contrib code directories
* Remove contrib package related configurations for build tools
* Remove mention of DIH example
* remove dih as build dependencies and no-longer needed version pins
* Remove README references to DIH
* Remove dih mention from the script that probably does need to exist at all
* More build artifact references
* More removed dependencies leftovers (licenses/versions)
* No need to smoke exclude DIH anymore
* Remove Admin UI's DIH integration
* Remove DIH from shortname package list
* Remove unused DIH (related? not?) dataset
Unclear what is happening here, but there is no reference to that directory anywhere else
The other parallel directories ARE referenced in a TestConfigSetsAPI.java
* Hidden Idea files references
* No DIH to ignore anymore
* Remove last Derby DB references
* Remove DIH from documentation
Add the information in Major Changes document with the link to the external repo
* Added/updated a mention to CHANGES
* Fix leftover library mentions
* Fix Spellings
* LUCENE-9382: update gradle to 6.4.1. Requires minor changes around log call validation by restructuring the code around switches to a series of ifs. Piggybacking: apply log validation to :solr modules. Add inputs declaration so that task is not re-run on unmodified files.
This also automatically collects linked projects by its dependencies, so we don't need to maintain all inter-project javadocs links.
Co-authored-by: Dawid Weiss <dweiss@apache.org>
* Sync French stop words with latest version from Snowball.
This new version removed some French homonyms from the list
* Use latest master commit from snowball-website
* LUCENE-9354: regenerate with 'gradle snowball
* LUCENE-9354: add CHANGES.txt entry
ASF Release Policy states that we cannot have binary JAR files checked
in to our source releases, a few other projects have solved this by
modifying their generated gradlew scripts to download a copy of the
wrapper jar.
We now have a version and checksum file in ./gradle/wrapper directory
used for verifying the wrapper jar, and will take advantage of single
source java execution to verify and download.
The gradle wrapper jar will continue to be available in the git
repository, but will be excluded from src tarball generation. This
should not change workflows for any users, since we expect the gradlew
script to get the jar when it is missing.
Co-authored-by: Dawid Weiss <dweiss@apache.org>
Die, python2, die.
Some generated .java files change (parameterized automata for
spell-correction).
This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).
So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.
The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).
For example in Fedora Linux:
https://fedoraproject.org/wiki/Changes/Python_means_Python3
For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.
Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
Previous situation:
* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.
Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.
New situation:
* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.
Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).
Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.
Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5
features are used, but unfortunately also some constructs that do not
exist in HTML5 are used as well.
Because of this, we have no checking of any html syntax. jtidy is
disabled because it works with html4. doclint is disabled because it
works with html5. our docs are neither.
javadoc "doclint" feature can efficiently check that the html isn't
crazy. we just have to fix really ancient removed/deprecated stuff
(such as use of tt tag).
This enables the html checking in both ant and gradle. The docs are
fixed via straightforward transformations.
One exception is table cellpadding, for this some helper CSS classes
were added to make the transition easier (since it must apply padding
to inner th/td, not possible inline). I added TODOs, we should clean
this up. Most problems look like they may have been generated from a
GUI or similar and not a human.