Commit Graph

507 Commits

Author SHA1 Message Date
Uwe Schindler 08a13ce589 Upgrade forbiddenapis to hotfix release 3.0.1 (allows upgrade to commons-io 2.7 in Solr) 2020-06-04 01:01:42 +02:00
Uwe Schindler 64eed9a1a6
LUCENE-9347: Add support for forbiddenapis 3.0 (#1459)
LUCENE-9347: Add support for forbiddenapis 3.0
2020-04-27 11:54:59 +02:00
Mike Drob 46d011645c LUCENE-9170: Use HTTPS when downloading wagon-ssh artifacts
Co-authored-by: Ishan Chattopadhyaya <ishan@apache.org>
2020-04-01 11:40:58 +01:00
Robert Muir 9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00
Robert Muir b9a569e7be
LUCENE-9230: explicitly call python version we want from builds
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).

For example in Fedora Linux:

https://fedoraproject.org/wiki/Changes/Python_means_Python3

For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.

Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
2020-02-18 18:58:17 -05:00
Robert Muir 0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00
Robert Muir f41eabdc5f
LUCENE-8279: fix javadocs wrong header levels and accessibility issues
Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.

Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).

Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.
2020-02-08 10:00:00 -05:00
Robert Muir 69f26d099e
LUCENE-9213: fix documentation-lint (and finally precommit) to work on java 12 and 13
the "missing javadocs" checker needed tweaks to work with the format
changes of java 13.

As a followup we may investigate javadoc (maybe the new doclet api). It
has its own missing checks too now, but they are black vs white (either
fully documented or not checked), whereas this python tool allows us to
"improve", e.g. enforce that all classes have doc, even if all
methods do not yet.
2020-02-07 17:18:26 -05:00
Robert Muir 0d339043e3
LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy
Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5
features are used, but unfortunately also some constructs that do not
exist in HTML5 are used as well.

Because of this, we have no checking of any html syntax. jtidy is
disabled because it works with html4. doclint is disabled because it
works with html5. our docs are neither.

javadoc "doclint" feature can efficiently check that the html isn't
crazy. we just have to fix really ancient removed/deprecated stuff
(such as use of tt tag).

This enables the html checking in both ant and gradle. The docs are
fixed via straightforward transformations.

One exception is table cellpadding, for this some helper CSS classes
were added to make the transition easier (since it must apply padding
to inner th/td, not possible inline). I added TODOs, we should clean
this up. Most problems look like they may have been generated from a
GUI or similar and not a human.
2020-02-06 22:30:52 -05:00
Robert Muir 7f4560c59a
LUCENE-9199: allow building javadocs on java 13+ 2020-02-06 10:39:41 -05:00
Robert Muir 7382375d8a
support ECJ linting on newer JDK versions
The entire precommit task will still fail with unsupported java version
(subsequent checks do not support the newer javadocs format).

But this allows the ECJ linter to run, which checks for things such as
unused imports.
2020-01-31 14:16:04 -05:00
Dawid Weiss 043dd207b6 LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before. 2020-01-30 13:09:47 +01:00
Robert Muir 9dae566ee7
LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.
Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.
2020-01-22 09:58:30 -05:00
Dawid Weiss 1e4565ce26 Don't delete jetty-start when regenerating sha checksums from ant. 2020-01-16 18:58:55 +01:00
Dawid Weiss 3008dd9526 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-13 17:55:53 +01:00
Dawid Weiss 7dc4df9524 LUCENE-9126: enable javadoc linting bypassing java bug. Corrected syntax errors so that validations passes but had to disable ALL html checks (tons of them). 2020-01-13 17:50:57 +01:00
Dawid Weiss d7c2e3029b Ignore gradle files for rat checks. 2019-12-13 13:44:18 +01:00
Robert Muir a6e7c770c2 SOLR-14064: remove some hadoop brain damage from build environment
Some permissions and build hacks were made on behalf of hadoop: hacks on
top of hacks. Now that the major problems such as classpath pollution and
hadoop test code are fixed, so we can remove hacks built on top of them.
2019-12-13 03:54:06 -05:00
Robert Muir f894bd019e LUCENE-9090: remove ant runtime pollution from tests classpath
previously, entire classpath of ant (ant itself, plugins, ivy, etc) were
polluting the unit tests classpath. it leads to non-reproducible build
issues because tests classpath is different depending on things outside
of source code control.

for example, solr tests launching hadoop, hadoop launching jetty, jetty
scanning classpath -> boom
2019-12-11 20:23:35 -05:00
Kevin Risden 7c8635d600
SOLR-14028: Fix test permissions for TestSolrCLIRunExample
Signed-off-by: Kevin Risden <krisden@apache.org>
2019-12-07 16:32:11 -05:00
Robert Muir c4126ef858 SOLR-14015: remove blanket filesystem read access from solr-tests.policy
Restrict this to only minimal paths like lucene. It is the defense for directory traversal attacks.
It will also help find bad bugs where things are reading filesystem in the wrong locations.
2019-12-04 23:16:19 -05:00
Kevin Risden abb7087f6e
LUCENE-9041: Upgrade ECJ to 3.19.0 to fix sporadic precommit javadoc issues
Lucene/Solr dev mail list post:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201911.mbox/%3CCAJU9nmhzmvg1mWPup9%2Bg3V%3Dsbz18M2DLO-3asEqaUCQpcZHiYA%40mail.gmail.com%3E

Signed-off-by: Kevin Risden <krisden@apache.org>
2019-11-11 19:40:10 -05:00
Uwe Schindler 3f76432c68 Update forbiddenapis to v2.7 and Groovy to v2.4.17 2019-10-12 21:24:38 +02:00
Chris Hostetter bc0652ecc0 LUCENE-8991: disable HashMap assertions (by default) on java9 and java1.8 as well 2019-10-07 15:58:22 -07:00
Chris Hostetter 10da07a396 LUCENE-8991: disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 2019-10-02 15:58:26 -07:00
Tomoko Uchida 98c85a0e1a LUCENE-8778: Define analyzer SPI names as static final fields and document the names in all analysis components. This also changes SPI loader to detect service names via the static NAME fields instead of class names. 2019-06-22 10:46:37 +09:00
Uwe Schindler c756b50ae4 LUCENE-8807: Change all download URLs in build files to HTTPS 2019-05-21 17:06:00 +02:00
Tomoko Uchida 1204327b56 LUCENE-8738: Force locale to be 'en_US' in javadocs task. 2019-04-21 21:28:14 +09:00
Uwe Schindler cd0706bd43 Revert previous change to use separate Groovy artifacts: Use groovy-all again (bugs in ivy) 2019-04-20 10:51:05 +02:00
Uwe Schindler a43fa13d11 Revert Groovy update and downgrade Groovy to 2.4.16, as new version is not compatible with Java 13 EA builds! 2019-04-20 02:57:26 +02:00
Uwe Schindler 52090c9b11 Update flexmark to latest version 2019-04-20 01:12:51 +02:00
Uwe Schindler f1911f82d5 LUCENE-8768: Fix Javadocs build in Java 11 2019-04-20 01:01:13 +02:00
Uwe Schindler 77e1bec7dc LUCENE-8738: Add missing dependency for Maven build 2019-04-20 00:51:55 +02:00
Uwe Schindler b8494c8702 LUCENE-8738: Update Groovy to make the warnings with Java 11 a bit more silent (only one warning on first Groovy invocation) 2019-04-20 00:09:56 +02:00
Uwe Schindler faaee86efb LUCENE-8738: Move to Java 11 as minimum Java version (merged branch: jira/LUCENE-8738)
Co-authored-by: Adrien Grand <jpountz@apache.org>
2019-04-16 14:00:09 +02:00
Uwe Schindler 2a1ed6e484 LUCENE-8729: Workaround to allow compile under JDK13+ 2019-03-19 19:05:15 +01:00
Steve Rowe 283b19a8da LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji tokenization with the '<EMOJI>' token type. 2019-01-08 13:33:49 -05:00
markrmiller 16241f4484 LUCENE-8546: Fix ant beast to fail and succeed based on whether beasting actually fails or succeeds.
LUCENE-8541: Fix ant beast to not overwrite junit xml results for each beast.iters iteration.
2018-12-02 10:10:10 -06:00
Varun Thacker 3e87499d72 Tweak test-help example for running tests within a package 2018-10-24 15:33:23 -07:00
Jan Høydahl 03c9c04353 LUCENE-8493: Stop publishing insecure .sha1 files with releases 2018-09-26 15:31:26 +02:00
Uwe Schindler 4940b3666e LUCENE-8504: Update forbiddenapis to 2.6 2018-09-17 13:02:11 +02:00
Jan Høydahl 5b96f89d2b LUCENE-5143: Fix smoketester, fix RM PGP key check, fix solr DOAP file, add CHANGES entry
Remove unused/stale 'copy-to-stage' and '-dist-keys' targets from ant build
2018-09-11 22:39:19 +02:00
Dawid Weiss 6be01e2ade LUCENE-8485: Update randomizedtesting to version 2.6.4. 2018-09-05 11:51:02 +02:00
Erick 59550fc262 LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml 2018-08-15 15:32:41 -07:00
Alexandre Rafalovitch b7d14c50fb SOLR-11694: Remove outdated UIMA module 2018-07-07 09:58:57 -04:00
Uwe Schindler 060d82af31 LUCENE-8230: Upgrade forbiddenapis to version 2.5 2018-03-28 20:06:54 +02:00
Jan Høydahl 9e780ba564 LUCENE-7935: Publish .sha512 hash files with the release artifacts 2018-03-28 09:14:03 +02:00
Alan Woodward f4fb19fdc9 LUCENE-8224: Fix typo in error message 2018-03-26 16:32:28 +01:00
Alan Woodward f2e3b109e6 LUCENE-8224: Allow releases to be built with ant 1.10
Also adds a check to common-build.xml to fail early with ant 1.10.2, which
has a bug that prevents lucene from building.
2018-03-26 14:06:54 +01:00
Erick Erickson 624d128b5e SOLR-7887: Upgrade Solr to use log4j2 -- log4j 1 now officially end of life 2018-03-25 19:16:09 -07:00