lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	3a8ed5e8ed	LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter.	2020-01-30 13:48:16 +01:00
Dawid Weiss	043dd207b6	LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before.	2020-01-30 13:09:47 +01:00
Adrien Grand	13e2094804	LUCENE-4702: Improve performance for fuzzy queries. Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix length is 1 or 2. By not compressing those, we can trade very little space (a couple MBs in the case of the wikibigall index) for better query efficiency.	2020-01-30 10:37:39 +01:00
Ignacio Vera	a9482911a8	LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170 )	2020-01-30 08:03:22 +01:00
Robert Muir	29469b454f	LUCENE-9192: speed up more slow tests	2020-01-29 14:31:32 -05:00
Ignacio Vera	c98229948a	LUCENE-9152: Improve line intersection detection for polygons (#1187 )	2020-01-29 19:24:51 +01:00
Dawid Weiss	e25dac085f	LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant).	2020-01-29 17:02:59 +01:00
Adrien Grand	7941d109bd	SOLR-13897: Fix precommit.	2020-01-28 20:11:47 +01:00
Adrien Grand	92b684c647	LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197 )	2020-01-28 19:06:53 +01:00
Adrien Grand	6eb8834a57	LUCENE-4702: Reduce terms dictionary compression overhead. (#1216 ) Changes include: - Removed LZ4 compression of suffix lengths which didn't save much space anyway. - For stats, LZ4 was only really used for run-length compression of terms whose docFreq is 1. This has been replaced by explicit run-length compression. - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we now only try LZ4 out if the average suffix length is greater than 6, in order to reduce index-time overhead.	2020-01-28 18:38:30 +01:00
Robert Muir	4773574578	LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes The issue is that MockDirectoryWrapper's disk full check is horribly inefficient. On every writeByte/etc, it totally recomputes disk space across all files. This means it calls listAll() on the underlying Directory (which sorts all the underlying files), then sums up fileLength() for each of those files. This leads to many pathological cases in the disk full tests... but the number of tests impacted by this is minimal, and the logic is scary.	2020-01-28 12:24:31 -05:00
Robert Muir	3bcc97c8eb	LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase	2020-01-28 11:55:51 -05:00
Robert Muir	4350efa932	LUCENE-9187: remove too-expensive assert from LZ4 HighCompressionHashTable	2020-01-28 11:45:43 -05:00
Robert Muir	e504798a44	LUCENE-9185: add "tests.profile" to gradle build to aid fixing slow tests Run test(s) with -Ptests.profile=true to print a histogram at the end of the build.	2020-01-28 11:27:18 -05:00
Cassandra Targett	1a14c67426	Ref Guide: Remove outdated or invalid links to Solr Wiki; update URL of those that remain	2020-01-27 16:38:31 -06:00
Cassandra Targett	b2f51f1941	Ref Guide: fix undefined substitution error caused by formatting of variables in paths	2020-01-27 16:38:30 -06:00
Jan Høydahl	53f7b394e4	SOLR-11207: Mute warnings for owasp false positives	2020-01-27 21:03:20 +01:00
Dawid Weiss	ff635cf701	LUCENE-9184, LUCENE-9183: allow skipping git status check in precommit with -Pvalidation.git.failOnModified=false (or place this in gradle.properties to make it permanent).	2020-01-27 20:47:02 +01:00
Uwe Schindler	7dc35e3a62	Let precommit depend on generic forbiddenApis task	2020-01-27 19:47:54 +01:00
Adrien Grand	9e4c445d17	LUCENE-4702: CHANGES entry.	2020-01-27 18:27:53 +01:00
Robert Muir	fd5a0ce7c2	LUCENE-9182: the rat-sources.gradle was the one .gradle file already with a license header, we don't need it twice	2020-01-27 12:11:44 -05:00
Robert Muir	975df9ddd3	LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task	2020-01-27 12:05:34 -05:00
Dawid Weiss	b420ef8f77	LUCENE-9179: don't invoke the same build recursively upon first run, just continue. Seems like gradle bug but let's not cry about it - it just happens once and CI defaults can be passed independently on command-line.	2020-01-27 17:34:13 +01:00
Robert Muir	8e357b167b	LUCENE-9180: dos2unix files that don't need dos line endings	2020-01-27 11:29:59 -05:00
Dawid Weiss	a3b0cfcbe2	Moved under help/	2020-01-27 17:23:41 +01:00
Dawid Weiss	6bde0f3ec8	LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS of memory and time... insane compared to the size of the input. None of my machines pass it without at least 12 gigs of heap (!).	2020-01-27 12:36:13 +01:00
Jan Høydahl	39df74de37	SOLR-11207: Exclude configuration 'unifiedClasspath' It is generated by consistent-versions plugin and triggers owasp warnings for deps even for excluded projects	2020-01-27 12:17:31 +01:00
Robert Muir	2bb63afdaf	LUCENE-9166: gradle build: test failures need stacktraces	2020-01-27 06:09:04 -05:00
Robert Muir	fddb5314fc	LUCENE-9172: nuke some compiler warnings	2020-01-27 06:08:30 -05:00
Robert Muir	5f964eeef2	SOLR-14217: tests respect tests.workDir correctly (prevent SSD destruction)	2020-01-27 06:07:48 -05:00
Alan Woodward	02f862670e	LUCENE-9153: Allow WhitespaceAnalyzer to set a custom maxTokenLen (#1198 ) WhitespaceTokenizer defaults to a maximum token length of 255, and WhitespaceAnalyzer does not allow this to be changed. This commit adds an optional maxTokenLen parameter to WhitespaceAnalyzer as well, and documents the existing token length restriction.	2020-01-27 09:22:25 +00:00
Jan Høydahl	9ddd05cd14	SOLR-11207: Exclude solr-ref-guide from owasp check It picked up log4j1 dependency only used during build	2020-01-27 09:55:12 +01:00
Ignacio Vera	1fe4177ac0	LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount (#1212 )	2020-01-27 09:52:25 +01:00
Dawid Weiss	ae95f0ab68	LUCENE-9134: lucene:core:jflexStandardTokenizerImpl	2020-01-27 09:03:19 +01:00
Shalin Shekhar Mangar	776631254f	SOLR-13897: Fix unsafe publication of Terms object in ZkShardTerms that can cause visibility issues and race conditions under contention	2020-01-27 12:08:20 +05:30
Dawid Weiss	6f85ec0460	LUCENE-9174: Bump default gradle memory to 2g	2020-01-26 18:27:41 +01:00
Uwe Schindler	fd49c903b8	SOLR-14189: Add changes entry	2020-01-26 12:06:13 +01:00
andywebb1975	efd0e8f3e8	SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172 )	2020-01-26 12:03:39 +01:00
Uwe Schindler	0635756f76	Fix Windows Line endings in the source-patterns checker (silly bug: it's \r\n on windows not the other way round)	2020-01-26 11:48:24 +01:00
Dawid Weiss	5ab59f59ac	SOLR-11207: minor changes: - added 'owasp' task to the root project. This depends on dependencyCheckAggregate which seems to be a better fit for multi-module projects than dependencyCheckAnalyze (the difference is vague to me from plugin's documentation). - you can run the "gradlew owasp" task explicitly and it'll run the validation without any flags. - the owasp task is only added to check if validation.owasp property is true. I think this should stay as the default on non-CI systems (developer defaults) because it's a significant chunk of time it takes to download and validate dependencies. - I'm not sure all configurations should be included in the check... perhaps we should only limit ourselves to actual runtime dependencies not build dependencies, solr-ref-guide, etc.	2020-01-26 10:45:05 +01:00
Jan Høydahl	74a8d6d5ac	SOLR-11207: Add OWASP dependency checker to gradle build (#1121 ) * SOLR-11207: Add OWASP dependency checker to gradle build	2020-01-26 10:01:51 +01:00
Gus Heck	127ce3e360	SOLR-13749 adjust changes to reflect backport to 8.5	2020-01-25 00:53:27 -05:00
Cassandra Targett	74e88deba7	Revert "SOLR-12930: move Gradle docs from ./help/ to new ./dev-docs/ directory" This reverts commit `2d8650d36c`.	2020-01-24 15:56:00 -06:00
Cassandra Targett	ba77a5f2eb	SOLR-14214: Clean up client lists and references	2020-01-24 15:46:30 -06:00
Mike	eaa3dbe440	SOLR-14162 TestInjection can leak Timer objects (#1137 )	2020-01-24 14:04:22 -06:00
Paul Merlin	24f7a28ac1	Add Github Workflow for Gradle Wrapper Validation (#1207 )	2020-01-24 20:42:30 +01:00
Robert Muir	f5e9bb9493	LUCENE-9165: explicitly cast with the horrible groovy language so that numbers above 9 don't fail	2020-01-24 09:53:47 -05:00
Robert Muir	c53cc3edaf	LUCENE-9167: test speedup for slowest/pathological tests (round 3)	2020-01-24 08:58:59 -05:00
Robert Muir	4d61e4aaab	change generate-defaults.gradle not to cap testsJvms at 4	2020-01-24 08:49:17 -05:00
Adrien Grand	b283b8df62	LUCENE-4702: Terms dictionary compression. (#1126 ) Compress blocks of suffixes in order to make the terms dictionary more space-efficient. Two compression algorithms are used depending on which one is more space-efficient: - LowercaseAsciiCompression, which applies when all bytes are in the `[0x1F,0x3F)` or `[0x5F,0x7F)` ranges, which notably include all digits, lowercase ASCII characters, '.', '-' and '_', and encodes 4 chars on 3 bytes. It is very often applicable on analyzed content and decompresses very quickly thanks to auto-vectorization support in the JVM. - LZ4, when the compression ratio is less than 0.75. I was a bit unhappy with the complexity of the high-compression LZ4 option, so I simplified it in order to only keep the logic that detects duplicate strings. The logic about what to do in case overlapping matches are found, which was responsible for most of the complexity while only yielding tiny benefits, has been removed.	2020-01-24 14:46:57 +01:00

1 2 3 4 5 ...

33250 Commits All Branches Search

33250 Commits

All Branches