Commit Graph

32819 Commits

Author SHA1 Message Date
Adrien Grand 3246b26058 LUCENE-9147: Fix codec excludes. 2020-02-06 10:34:35 +01:00
Houston Putman e0d35f9641 SOLR-13887: Use the default idleTimeout instead of 0 for HTTP2 (#991) 2020-02-05 12:45:14 -08:00
Chris Hostetter bbdfce944b SOLR-14241: New delete() Stream Decorator
(cherry picked from commit c5d0391df9)
2020-02-05 13:31:55 -07:00
Adrien Grand 597141df6b LUCENE-9147: Move the stored fields index off-heap. (#1179)
This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.
2020-02-05 19:19:32 +01:00
Adrien Grand d007470bda SOLR-14242: HdfsDirectory#createTempOutput. (#1240) 2020-02-05 16:39:30 +01:00
Mike McCandless 3e63cd38ef LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests 2020-02-05 09:52:19 -05:00
Tomas Fernandez Lobbe 37d4121770 SOLR-14219: Revert changes in OverseerSolrRespose and move serialization (#1227)
SOLR-14095 Introduced an issue for rolling restarts (Incompatible Java serialization). This change fixes the compatibility issue while keeping the functionality in SOLR-14095
2020-02-04 11:07:38 -08:00
Adrien Grand d7859097ee SOLR-14238: Fix HdfsDirectory to no longer overwrite existing files. (#1237) 2020-02-04 19:35:52 +01:00
Munendra S N 358043d1f3 SOLR-14090: fix delete-copy-field when source is dynamic field 2020-02-04 21:48:56 +05:30
Munendra S N 5a3a05d953 SOLR-10567: add support for DateRangeField in JSON facet range 2020-02-04 21:47:54 +05:30
Andrzej Bialecki 4a002411fc SOLR-14239: Fix the behavior of CaffeineCache.computeIfAbsent on branch_8x. 2020-02-04 17:02:05 +01:00
Ignacio Vera 996945fff7 LUCENE-9197: fix wrong implementation on Point2D#withinTriangle (#1228) 2020-02-04 07:11:07 +01:00
Anshum Gupta 02f9b276b0
SOLR-14206: Annotate HttpSolrCall as thread-safe (#1205)
* SOLR-14206: Annotate HttpSolrCall and V2HttpCall as thread-safe
2020-02-03 10:00:43 -08:00
Mikhail Khludnev 34d299018e SOLR-12325: uniqueBlock(\{!v=foo:bar}) 2020-02-02 15:20:16 +03:00
Jan Høydahl e4721d9a2d SOLR-14221: Upgrade restlet to version 2.4.0 (#1211)
(cherry picked from commit 16b8d50284)
2020-02-02 11:45:01 +01:00
Kazuaki Hiraga 12242b52e6 LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option to control whether the tokenizer emits original tokens when the mode is not NORMAL. 2020-02-01 15:20:02 +09:00
Munendra S N 43d07db523 fix typo in schema-api documentation 2020-02-01 10:39:33 +05:30
Robert Muir 507ef67d5f
support ECJ linting on newer JDK versions
The entire precommit task will still fail with unsupported java version
(subsequent checks do not support the newer javadocs format).

But this allows the ECJ linter to run, which checks for things such as
unused imports.
2020-01-31 14:07:03 -05:00
Jason Gerlowski 68cfe27b68 SOLR-13892: Add 'top-level' docValues Join implementation (#1171) 2020-01-31 13:11:28 -05:00
Joel Bernstein d4a4b4413d SOLR-14139: Support backtick phrase queries in Streaming Expressions 2020-01-31 12:14:43 -05:00
Christine Poerschke fc3497d24c LUCENE-9195: precommit fix (remove unused import) 2020-01-31 16:53:12 +00:00
Christine Poerschke 53d8b5b1b8 LUCENE-8530: fix some 'rawtypes' javac warnings 2020-01-31 16:42:25 +00:00
Robert Muir 30b2cc0163
LUCENE-9195: more slow tests fixes 2020-01-31 09:27:01 -05:00
Chris Hostetter b2d8b784a3 New /stream test cases showing authn+authz edge cases in cloud mode
This triggers various places in the Streaming Expressions code that use background threads
to confirm that the expected credentails (or lack of) are propogarded along.

Test currently has comments + workarounds for 2 known client issues:
 - SOLR-14226: SolrStream reports AuthN/AuthZ failures (401|403) as IOException w/o details
 - SOLR-14222: CloudSolrClient converts (update) 403 error to 500 error

(cherry picked from commit 517438e356)
2020-01-30 10:04:09 -07:00
Adrien Grand 744dec7275 LUCENE-4702: Improve performance for fuzzy queries.
Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix
length is 1 or 2. By not compressing those, we can trade very little space (a
couple MBs in the case of the wikibigall index) for better query efficiency.
2020-01-30 10:40:44 +01:00
Ignacio Vera 46fa876c35 LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170) 2020-01-30 08:04:09 +01:00
Robert Muir e258ab32f0
LUCENE-9192: speed up more slow tests 2020-01-29 14:33:05 -05:00
Robert Muir 16f240e740
LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.
Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.
2020-01-29 13:59:07 -05:00
Robert Muir 5ee2a6fcae
fix merging difficulty while trying to give branch_8x some love 2020-01-29 13:57:48 -05:00
Robert Muir e1cc7eb4b7
LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
The issue is that MockDirectoryWrapper's disk full check is horribly
inefficient. On every writeByte/etc, it totally recomputes disk space
across all files. This means it calls listAll() on the underlying
Directory (which sorts all the underlying files), then sums up fileLength()
for each of those files.

This leads to many pathological cases in the disk full tests... but the
number of tests impacted by this is minimal, and the logic is scary.
2020-01-29 13:47:05 -05:00
Robert Muir 3dd47cf9c7
LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase 2020-01-29 13:46:41 -05:00
Robert Muir 037cc5b1de
LUCENE-9172: nuke some compiler warnings 2020-01-29 13:45:59 -05:00
Robert Muir 0cc67223a8
SOLR-14217: tests respect tests.workDir correctly (prevent SSD destruction) 2020-01-29 13:44:51 -05:00
Robert Muir 2fd904e3d0
LUCENE-9167: test speedup for slowest/pathological tests (round 3) 2020-01-29 13:44:19 -05:00
Robert Muir 70b60734a1
LUCENE-9163: test speedup for slowest/pathological tests
Calming down individual test methods with double-digit execution times
after running tests many times.

There are a few more issues remaining, but this solves the majority of them.
2020-01-29 13:41:57 -05:00
Robert Muir f787c093b3
TestPointValues only index 300k docs in NIGHTLY configuration, that is too much locally 2020-01-29 13:40:51 -05:00
Robert Muir 1faf5aa6c7
mark StressRamUsageEstimator tests nightly.
This is consistently the slowest test for me in all of lucene core by
far. Takes around an entire minute. Mark it nightly: should catch any
issues with RAM estimation but keep local builds fast.
2020-01-29 13:40:24 -05:00
Ignacio Vera 29542c7f59 LUCENE-9152: Improve line intersection detection for polygons (#1187) 2020-01-29 19:25:46 +01:00
Adrien Grand 47c01af394 SOLR-13897: Fix precommit. 2020-01-28 20:11:24 +01:00
Adrien Grand 25fc09ee9e LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197) 2020-01-28 19:10:46 +01:00
Adrien Grand 033220e2ab LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)
Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.
2020-01-28 19:09:59 +01:00
Cassandra Targett 088e6c3006 Ref Guide: Remove outdated or invalid links to Solr Wiki; update URL of those that remain 2020-01-27 16:39:22 -06:00
Cassandra Targett d5bacc9a1c Ref Guide: fix undefined substitution error caused by formatting of variables in paths 2020-01-27 16:39:10 -06:00
Adrien Grand 666bdac64d LUCENE-4702: CHANGES entry. 2020-01-27 18:28:22 +01:00
Adrien Grand 33a7af9cbf LUCENE-4702: Terms dictionary compression. (#1126)
Compress blocks of suffixes in order to make the terms dictionary more
space-efficient. Two compression algorithms are used depending on which one is
more space-efficient:
 - LowercaseAsciiCompression, which applies when all bytes are in the
   `[0x1F,0x3F)` or `[0x5F,0x7F)` ranges, which notably include all digits,
   lowercase ASCII characters, '.', '-' and '_', and encodes 4 chars on 3 bytes.
   It is very often applicable on analyzed content and decompresses very quickly
   thanks to auto-vectorization support in the JVM.
 - LZ4, when the compression ratio is less than 0.75.

I was a bit unhappy with the complexity of the high-compression LZ4 option, so
I simplified it in order to only keep the logic that detects duplicate strings.
The logic about what to do in case overlapping matches are found, which was
responsible for most of the complexity while only yielding tiny benefits, has
been removed.
2020-01-27 18:28:18 +01:00
Adrien Grand ace4fcc7be LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. (#1149) (#1158)
All the metadata can be directly encoded in the `DataOutput`.
2020-01-27 18:28:18 +01:00
Robert Muir d614bb854d
LUCENE-9180: dos2unix files that don't need dos line endings. gitignore gradle-specific stuff that shows up modified if you switch branches, no gradle here. 2020-01-27 11:31:59 -05:00
Alan Woodward 4bf883ddb8 LUCENE-9153: Allow WhitespaceAnalyzer to set a custom maxTokenLen (#1198)
WhitespaceTokenizer defaults to a maximum token length of 255, and WhitespaceAnalyzer
does not allow this to be changed. This commit adds an optional maxTokenLen parameter
to WhitespaceAnalyzer as well, and documents the existing token length restriction.
2020-01-27 09:22:56 +00:00
Ignacio Vera 89c72a693b LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount (#1212) 2020-01-27 09:53:18 +01:00
Andrzej Bialecki df91041652 SOLR-14211: Fix a bug introduced in SOLR-14192. 2020-01-27 09:24:58 +01:00