Commit Graph

33173 Commits

Author SHA1 Message Date
Adrien Grand 136dcbdbbc
LUCENE-9147: Move the stored fields index off-heap. (#1179)
This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.
2020-02-05 18:35:08 +01:00
Adrien Grand fe349ddcf2
SOLR-14242: HdfsDirectory#createTempOutput. (#1240) 2020-02-05 16:38:53 +01:00
Mike McCandless 47386f8cca LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests 2020-02-05 09:51:31 -05:00
Adrien Grand 2d8428ec2e
SOLR-14238: Fix HdfsDirectory to no longer overwrite existing files. (#1237) 2020-02-04 19:35:15 +01:00
Tomas Fernandez Lobbe bb90569f1d
SOLR-14219: Revert changes in OverseerSolrRespose and move serialization (#1227)
SOLR-14095 Introduced an issue for rolling restarts (Incompatible Java serialization). This change fixes the compatibility issue while keeping the functionality in SOLR-14095
2020-02-04 10:26:57 -08:00
Munendra S N c91dd9d0e4 SOLR-14090: fix delete-copy-field when source is dynamic field 2020-02-04 21:33:31 +05:30
Munendra S N 4eff9c9b5e SOLR-10567: add support for DateRangeField in JSON facet range 2020-02-04 21:26:40 +05:30
Erick Erickson b0bb299dc4
LUCENE-9134: Port ant-regenerate tasks to Gradle build (#1230)
LUCENE-9134: Port ant-regenerate tasks to Gradle build (Solr javacc)
2020-02-04 09:16:38 -05:00
Ignacio Vera 641680fbf1
LUCENE-9197: fix wrong implementation on Point2D#withinTriangle (#1228) 2020-02-04 07:10:08 +01:00
Erick Erickson d3ac1329a3
LUCENE-8656: Deprecations in FuzzyQuery (#1229)
LUCENE-8656: Deprecations in FuzzyQuery

Closes #1229
2020-02-03 08:52:33 -05:00
Mikhail Khludnev d8bc9bcfcf SOLR-12325: uniqueBlock(\{!v=foo:bar}) 2020-02-02 15:15:35 +03:00
Jan Høydahl 16b8d50284
SOLR-14221: Upgrade restlet to version 2.4.0 (#1211) 2020-02-02 11:35:14 +01:00
Kazuaki Hiraga b457c2ee2e LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option to control whether the tokenizer emits original tokens when the mode is not NORMAL. 2020-02-01 14:51:09 +09:00
Munendra S N a2c53dad72 fix typo in schema-api documentation 2020-02-01 10:21:52 +05:30
Erick Erickson 5253c0cb74
LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226)
LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226
2020-01-31 17:04:10 -05:00
Robert Muir 7382375d8a
support ECJ linting on newer JDK versions
The entire precommit task will still fail with unsupported java version
(subsequent checks do not support the newer javadocs format).

But this allows the ECJ linter to run, which checks for things such as
unused imports.
2020-01-31 14:16:04 -05:00
Joel Bernstein db78f6cd00 SOLR-14139: Support backtick phrase queries in Streaming Expressions 2020-01-31 11:54:14 -05:00
Christine Poerschke 0c1b19a321 LUCENE-8530: fix some 'rawtypes' javac warnings 2020-01-31 16:40:55 +00:00
Jason Gerlowski 719b38c8d8
SOLR-13892: Add 'top-level' docValues Join implementation (#1171) 2020-01-31 11:21:01 -05:00
Robert Muir 9ceaff913e
LUCENE-9195: more slow tests fixes 2020-01-31 07:57:34 -05:00
Robert Muir ed7f507c3c
LUCENE-9193: fix documentation typo for gradle tests 2020-01-30 23:54:31 -05:00
Chris Hostetter 517438e356 New /stream test cases showing authn+authz edge cases in cloud mode
This triggers various places in the Streaming Expressions code that use background threads
to confirm that the expected credentails (or lack of) are propogarded along.

Test currently has comments + workarounds for 2 known client issues:
 - SOLR-14226: SolrStream reports AuthN/AuthZ failures (401|403) as IOException w/o details
 - SOLR-14222: CloudSolrClient converts (update) 403 error to 500 error
2020-01-30 10:01:03 -07:00
Robert Muir 4b5105e167
LUCENE-9193: heap allocations for tests.profile
Can be a bit noisier than cpu sampling, due to how threads are allocated
in tests... maybe we can improve that in the future.
2020-01-30 08:29:10 -05:00
Dawid Weiss 3a8ed5e8ed LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter. 2020-01-30 13:48:16 +01:00
Dawid Weiss 043dd207b6 LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before. 2020-01-30 13:09:47 +01:00
Adrien Grand 13e2094804 LUCENE-4702: Improve performance for fuzzy queries.
Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix
length is 1 or 2. By not compressing those, we can trade very little space (a
couple MBs in the case of the wikibigall index) for better query efficiency.
2020-01-30 10:37:39 +01:00
Ignacio Vera a9482911a8
LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170) 2020-01-30 08:03:22 +01:00
Robert Muir 29469b454f
LUCENE-9192: speed up more slow tests 2020-01-29 14:31:32 -05:00
Ignacio Vera c98229948a
LUCENE-9152: Improve line intersection detection for polygons (#1187) 2020-01-29 19:24:51 +01:00
Dawid Weiss e25dac085f LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant). 2020-01-29 17:02:59 +01:00
Adrien Grand 7941d109bd SOLR-13897: Fix precommit. 2020-01-28 20:11:47 +01:00
Adrien Grand 92b684c647
LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197) 2020-01-28 19:06:53 +01:00
Adrien Grand 6eb8834a57
LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)
Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.
2020-01-28 18:38:30 +01:00
Robert Muir 4773574578
LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
The issue is that MockDirectoryWrapper's disk full check is horribly
inefficient. On every writeByte/etc, it totally recomputes disk space
across all files. This means it calls listAll() on the underlying
Directory (which sorts all the underlying files), then sums up fileLength()
for each of those files.

This leads to many pathological cases in the disk full tests... but the
number of tests impacted by this is minimal, and the logic is scary.
2020-01-28 12:24:31 -05:00
Robert Muir 3bcc97c8eb
LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase 2020-01-28 11:55:51 -05:00
Robert Muir 4350efa932
LUCENE-9187: remove too-expensive assert from LZ4 HighCompressionHashTable 2020-01-28 11:45:43 -05:00
Robert Muir e504798a44
LUCENE-9185: add "tests.profile" to gradle build to aid fixing slow tests
Run test(s) with -Ptests.profile=true to print a histogram at the end of
the build.
2020-01-28 11:27:18 -05:00
Cassandra Targett 1a14c67426 Ref Guide: Remove outdated or invalid links to Solr Wiki; update URL of those that remain 2020-01-27 16:38:31 -06:00
Cassandra Targett b2f51f1941 Ref Guide: fix undefined substitution error caused by formatting of variables in paths 2020-01-27 16:38:30 -06:00
Jan Høydahl 53f7b394e4 SOLR-11207: Mute warnings for owasp false positives 2020-01-27 21:03:20 +01:00
Dawid Weiss ff635cf701 LUCENE-9184, LUCENE-9183: allow skipping git status check in precommit with -Pvalidation.git.failOnModified=false (or place this in gradle.properties to make it permanent). 2020-01-27 20:47:02 +01:00
Uwe Schindler 7dc35e3a62 Let precommit depend on generic forbiddenApis task 2020-01-27 19:47:54 +01:00
Adrien Grand 9e4c445d17 LUCENE-4702: CHANGES entry. 2020-01-27 18:27:53 +01:00
Robert Muir fd5a0ce7c2
LUCENE-9182: the rat-sources.gradle was the one .gradle file already with a license header, we don't need it twice 2020-01-27 12:11:44 -05:00
Robert Muir 975df9ddd3
LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task 2020-01-27 12:05:34 -05:00
Dawid Weiss b420ef8f77 LUCENE-9179: don't invoke the same build recursively upon first run, just continue. Seems like gradle bug but let's not cry about it - it just happens once and CI defaults can be passed independently on command-line. 2020-01-27 17:34:13 +01:00
Robert Muir 8e357b167b
LUCENE-9180: dos2unix files that don't need dos line endings 2020-01-27 11:29:59 -05:00
Dawid Weiss a3b0cfcbe2 Moved under help/ 2020-01-27 17:23:41 +01:00
Dawid Weiss 6bde0f3ec8 LUCENE-9134: UAX29URLEmailTokenizerImpl regeneration. This requires TONS
of memory and time... insane compared to the size of the input. None of my
machines pass it without at least 12 gigs of heap (!).
2020-01-27 12:36:13 +01:00
Jan Høydahl 39df74de37 SOLR-11207: Exclude configuration 'unifiedClasspath'
It is generated by consistent-versions plugin and triggers owasp warnings for deps even for excluded projects
2020-01-27 12:17:31 +01:00