lucene

Commit Graph

Author	SHA1	Message	Date
Robert Muir	0203815ab2	LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262 ) Previous situation: * The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact. * Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc. * Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly). * Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied. * Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all. Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better. New situation: * Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy. * Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works. * Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them). * Stopword files are automatically regenerated from the commit hash of the snowball website repository. * The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.	2020-02-17 12:38:01 -05:00
Claire Pollard	188f620208	Update README.txt (#1090 ) Update the analysis-extras README to include reference to including solr-analysis-extras jar.	2020-02-15 22:57:46 +01:00
Kevin Risden	c4a8a77d23	SOLR-14209: Upgrade JQuery to 3.4.1 * JQuery 2.1.3 to 3.4.1 * jstree 1.0-rc1 to v3.3.8 Closes #1209 Signed-off-by: Kevin Risden <krisden@apache.org>	2020-02-08 11:57:56 -06:00
Robert Muir	f41eabdc5f	LUCENE-8279: fix javadocs wrong header levels and accessibility issues Java 13 adds a new doclint check under "accessibility" that the html header nesting level isn't crazy. Many are incorrect because the html4-style javadocs had horrible font-sizes, so developers used the wrong header level to work around it. This is no issue in trunk (always html5). Java recommends against using such structured tags at all in javadocs, but that is a more involved change: this just "shifts" header levels in documents to be correct.	2020-02-08 10:00:00 -05:00
Robert Muir	0d339043e3	LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5 features are used, but unfortunately also some constructs that do not exist in HTML5 are used as well. Because of this, we have no checking of any html syntax. jtidy is disabled because it works with html4. doclint is disabled because it works with html5. our docs are neither. javadoc "doclint" feature can efficiently check that the html isn't crazy. we just have to fix really ancient removed/deprecated stuff (such as use of tt tag). This enables the html checking in both ant and gradle. The docs are fixed via straightforward transformations. One exception is table cellpadding, for this some helper CSS classes were added to make the transition easier (since it must apply padding to inner th/td, not possible inline). I added TODOs, we should clean this up. Most problems look like they may have been generated from a GUI or similar and not a human.	2020-02-06 22:30:52 -05:00
Robert Muir	975df9ddd3	LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task	2020-01-27 12:05:34 -05:00
Kevin Risden	9b6fc1b9fc	SOLR-14132: Upgrade Angular JS 1.3.8 to 1.7.9 * Upgrade Angular JS 1.3.8 to 1.7.9 * Upgrade Angular Chosen v1.3.0 and Chosen to v1.8.7 * Remove older jquery 1.7.2 version * Remove non minified Angular JS files Closes #1196 Signed-off-by: Kevin Risden <krisden@apache.org>	2020-01-23 09:20:12 -05:00
Shalin Shekhar Mangar	04193d5252	SOLR-14207: Fix logging statements with less or more arguments than placeholders	2020-01-23 14:00:08 +05:30
Houston Putman	ffba54a827	SOLR-11746: Adding existence queries for PointFields * DocValuesFieldExistsQuery and NormsFieldExistsQuery are used for existence queries when possible. * Added documentation on the difference between field:* and field:[* TO *]	2020-01-22 18:00:55 -05:00
Dawid Weiss	f9dde4de52	Merge remote-tracking branch 'origin/master' into gradle-master	2020-01-13 08:37:15 +01:00
Erick Erickson	3bae63d215	LUCENE-9080: Upgrade ICU4j to 62.2 and make regenerate work	2020-01-12 17:12:57 -05:00
Dawid Weiss	0674fada65	Merge remote-tracking branch 'origin/master' into gradle-master	2020-01-09 11:56:02 +01:00
Houston Putman	08b64aab8f	Revert "SOLR-11746: Existence query support for numeric point fields" This reverts commit `f5ab3ca688`.	2020-01-08 18:33:15 -05:00
Dawid Weiss	405d227c55	Merge remote-tracking branch 'origin/master' into gradle-master	2020-01-07 08:45:12 +01:00
Houston Putman	f5ab3ca688	SOLR-11746: Existence query support for numeric point fields	2020-01-06 12:12:22 -05:00
Dawid Weiss	0fce50593b	Add commons-csv to extraction deps.	2019-12-25 19:55:27 +01:00
Dawid Weiss	206d62b9d5	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-18 15:10:04 +01:00
Robert Muir	612cba38ca	SOLR-14110: sandbox javax.script usage in tests	2019-12-18 06:30:24 -05:00
Dawid Weiss	28b19c2af2	Merge with master.	2019-12-18 09:32:35 +01:00
Tim Allison	279a391cf3	SOLR-14054 -- upgrade to Tika 1.23 (and its dependencies) (#1092 ) * SOLR-14054 -- upgrade to Tika 1.23 (and its dependencies) * fix CHANGES.txt file	2019-12-17 16:09:08 -05:00
Dawid Weiss	4c94a13e69	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-17 13:38:14 +01:00
Robert Muir	dc35e5752b	LUCENE-9094: Ban ObjectInputStream and ObjectOutputStream in forbidden-apis	2019-12-16 13:31:11 -05:00
Dawid Weiss	6094d4dd13	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-12 14:16:48 +01:00
noble	b35f1debe3	SOLR-14013: javabin performance regressions	2019-12-12 23:26:37 +11:00
Ishan Chattopadhyaya	57e717eff2	SOLR-14065: Deprecate Velocity	2019-12-12 16:13:32 +05:30
Mikhail Khludnev	f01b3e97d1	SOLR-13904: Make Analytics component sensitive to timeAllowed.	2019-12-11 23:48:17 +03:00
Robert Muir	dc031ea382	SOLR-14050: clean up tests use of network addresses Solr tests now have a similar policy to Lucene, loopback use only. If a test tries to resolve or connect to the internet, it will get SecurityException. Some solr tests explicitly try to talk to dead nodes with real networking. This is not good and asking for trouble, but use low loopback port numbers instead of multicast addresses. The idea is that it fails faster. Move these to constants so that stuff isn't copy-pasted everywhere, in case we have to do something different later.	2019-12-11 12:51:45 -05:00
Erik Hatcher	128360856d	SOLR-14025: VelocityResponseWriter hardening	2019-12-11 12:36:14 -05:00
Dawid Weiss	27d5509644	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-11 08:57:18 +01:00
Erick Erickson	d189520935	SOLR-13953: Prometheus exporter in SolrCloud mode limited to 100 nodes	2019-12-10 20:19:30 -05:00
Dawid Weiss	1a24ccb4ee	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-05 11:17:34 +01:00
Robert Muir	e77027dd8c	SOLR-13993: sandbox velocity template render (if security manager is enabled) The solr permissions are weak sauce due to the huge number of features, third-party dependencies, etc. Hence they have access to do many things. For "scripting" such as velocity we have to look at a more aggressive stance: Step 1: Can we wrap a sandbox around the whole goddamn thing and call it a day? Step 2: Let's separate the "engine" from "untrusted code" and only be an asshole to the latter. Step 3: Java's security is shit, Lets contain that classloader and whitelist access.	2019-12-05 01:06:38 -05:00
Robert Muir	165529767b	SOLR-14000: clean up more static field leaks in tests On windows, these objects can't be inspected due to security restrictions. So the test runner fails the tests since it does not know how big the leak is.	2019-12-03 18:51:00 -05:00
Dawid Weiss	7c26c6de02	Merge remote-tracking branch 'origin/master' into gradle-master	2019-12-03 18:45:12 +01:00
Robert Muir	9e5d11be8a	fix static leaks, null stuff out in afterclass	2019-12-03 06:28:19 -05:00
Dawid Weiss	d4a9842375	Initial gradle build layer.	2019-12-02 15:34:57 +01:00
David Smiley	6a72b81ed3	SOLR-13971: Revert changes to the default configset. * clarified these are Java system properties * trivial dead code change; Boolean.getBoolean returns a primitive	2019-11-28 10:45:58 -05:00
Ishan Chattopadhyaya	212593d362	SOLR-13971: Renamed the velocity template parameter names	2019-11-28 15:46:26 +05:30
Ishan Chattopadhyaya	50e8cea918	SOLR-13971: Removing velocity from _default and disabling custom template support by default	2019-11-28 07:52:43 +05:30
Dawid Weiss	063c82ebd6	SOLR-13952: reverting Erick's commit (with permission).	2019-11-25 17:56:20 +01:00
Erick Erickson	4b34d726ab	SOLR-13952: Separate out Gradle-specific code from other (mostly test) changes and commit separately	2019-11-24 13:24:40 -05:00
Andrzej Bialecki	b4fe911cc8	SOLR-13817: Remove legacy SolrCache implementations.	2019-11-14 21:21:44 +01:00
Andrzej Bialecki	e58a90f18d	SOLR-13858: Clean up SolrInfoBean / SolrMetricProducer API.	2019-11-04 15:31:43 +01:00
Mikhail Khludnev	afdb80069c	SOLR-13824: reject prematurely closed curly bracket in JSON.	2019-10-21 23:25:06 +03:00
Andrzej Bialecki	f07998fc23	SOLR-13677: All Metrics Gauges should be unregistered by components that registered them.	2019-10-18 17:15:04 +02:00
Koen De Groote	e7e6cfaecf	LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). (#919 )	2019-10-14 18:45:47 +02:00
Koen De Groote	04786244d0	LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2	2019-10-14 18:36:19 +02:00
Chris Hostetter	4ec4061cbc	SOLR-13786: AwaitsFix SolrExporterIntegrationTest	2019-09-23 10:33:08 -07:00
Dawid Weiss	2a1d5eea42	SOLR-13779: Use the safe fork of simple-xml for clustering contrib	2019-09-19 12:24:26 +02:00
Houston Putman	c7f8487328	SOLR-13773: Prometheus Exporter GC and Heap options (#887 ) * SOLR-13773: Prometheus Exporter GC and Heap options * Adding info to the ref-guide.	2019-09-18 13:31:53 -07:00

1 2 3 4 5 ...

1189 Commits