Commit Graph

33538 Commits

Author SHA1 Message Date
Robert Muir 860115e450
LUCENE-9209: revert changes to test html file, not intended 2020-02-06 22:40:40 -05:00
Robert Muir 0d339043e3
LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy
Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5
features are used, but unfortunately also some constructs that do not
exist in HTML5 are used as well.

Because of this, we have no checking of any html syntax. jtidy is
disabled because it works with html4. doclint is disabled because it
works with html5. our docs are neither.

javadoc "doclint" feature can efficiently check that the html isn't
crazy. we just have to fix really ancient removed/deprecated stuff
(such as use of tt tag).

This enables the html checking in both ant and gradle. The docs are
fixed via straightforward transformations.

One exception is table cellpadding, for this some helper CSS classes
were added to make the transition easier (since it must apply padding
to inner th/td, not possible inline). I added TODOs, we should clean
this up. Most problems look like they may have been generated from a
GUI or similar and not a human.
2020-02-06 22:30:52 -05:00
Mike abd282d258
LUCENE-9142 Refactor IntSet operations for determinize (#1184)
* LUCENE-9142 Refactor SortedIntSet for equality

Split SortedIntSet into a class heirarchy to make comparisons to
FrozenIntSet more meaningful. Use Arrays.equals for more efficient
comparison. Add tests for IntSet to verify correctness.
2020-02-06 12:16:45 -08:00
Tomoko Uchida f3cd1dbde3 LUCENE-9077: Force locale en_US on Javadoc task (workaroud for JDK-8222793) 2020-02-07 01:36:45 +09:00
Adrien Grand 85dba7356f LUCENE-9147: Make sure temporary files get deleted on all code paths. 2020-02-06 17:13:28 +01:00
Robert Muir 7f4560c59a
LUCENE-9199: allow building javadocs on java 13+ 2020-02-06 10:39:41 -05:00
Alan Woodward 7c1ba1aebe
LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals (#1097)
If you have repeating intervals in an ordered or unordered interval source, you currently 
get somewhat confusing behaviour:

* `ORDERED(a, a, b)` will return an extra interval over just a b if it first matches a a b, meaning
that you can get incorrect results if used in a `CONTAINING` filter - 
`CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y`
* `UNORDERED(a, a)` will match on documents that just containg a single a.

This commit adds a RepeatingIntervalsSource that correctly handles repeats within 
ordered and unordered sources. It also changes the way that gaps are calculated within 
ordered and unordered sources, by using a new width() method on IntervalIterator. The 
default implementation just returns end() - start() + 1, but RepeatingIntervalsSource 
instead returns the sum of the widths of its child iterators. This preserves maxgaps filtering 
on ordered and unordered sources that contain repeats.

In order to correctly handle matches in this scenario, IntervalsSource#matches now always 
returns an explicit IntervalsMatchesIterator rather than a plain MatchesIterator, which adds 
gaps() and width() methods so that submatches can be combined in the same way that 
subiterators are. Extra checks have been added to checkIntervals() to ensure that the same 
intervals are returned by both iterator and matches, and a fix to 
DisjunctionIntervalIterator#matches() is also included - DisjunctionIntervalIterator minimizes 
its intervals, while MatchesUtils.disjunction does not, so there was a discrepancy between 
the two methods.
2020-02-06 14:44:47 +00:00
Adrien Grand fdf5ade727 LUCENE-9147: Fix codec excludes. 2020-02-06 10:34:03 +01:00
Adrien Grand 1b882246d7 LUCENE-9147: Avoid reusing file names with FileSwitchDirectory or NRTCachingDirectory and IOContext randomization. 2020-02-06 08:27:33 +01:00
Robert Muir 63be99bf12
SOLR-14118: default embedded zookeeper port to localhost 2020-02-05 21:33:37 -05:00
Robert Muir 196ec5f4a8
LUCENE-9206: add forbidden api exclusion to new class 2020-02-05 20:30:18 -05:00
Marcus bc5f837344
SOLR-14147 change the Security manager to default to true. (#1141)
* change the Security manager to default.
* update the ref-guide.
* uncomment init scripts update changes.
* changed the ref guide and re-commented file.
* remove added comment.
* modified shell script.
* removed comment in windows file.

Signed-off-by: marcussorealheis <marcuseagan@gmail.com>

* bashism and fix windows
* remove space

Signed-off-by: marcussorealheis <marcuseagan@gmail.com>
2020-02-05 19:17:55 -05:00
Robert Muir 93b83f635d
LUCENE-9206: Improve IndexMergeTool defaults and options
IndexMergeTool previously had no options and always forceMerge(1)
the resulting index. This can result in wasted work and confusing
performance (unbalancing the index).

Instead the default is to not do anything, except merges from the
merge policy.
2020-02-05 16:31:07 -05:00
Houston Putman 80ed8c281b
SOLR-13887: Use the default idleTimeout instead of 0 for HTTP2 (#991) 2020-02-05 11:15:37 -08:00
Chris Hostetter c5d0391df9 SOLR-14241: New delete() Stream Decorator 2020-02-05 10:49:24 -07:00
Adrien Grand 136dcbdbbc
LUCENE-9147: Move the stored fields index off-heap. (#1179)
This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.
2020-02-05 18:35:08 +01:00
Adrien Grand fe349ddcf2
SOLR-14242: HdfsDirectory#createTempOutput. (#1240) 2020-02-05 16:38:53 +01:00
Mike McCandless 47386f8cca LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests 2020-02-05 09:51:31 -05:00
Adrien Grand 2d8428ec2e
SOLR-14238: Fix HdfsDirectory to no longer overwrite existing files. (#1237) 2020-02-04 19:35:15 +01:00
Tomas Fernandez Lobbe bb90569f1d
SOLR-14219: Revert changes in OverseerSolrRespose and move serialization (#1227)
SOLR-14095 Introduced an issue for rolling restarts (Incompatible Java serialization). This change fixes the compatibility issue while keeping the functionality in SOLR-14095
2020-02-04 10:26:57 -08:00
Munendra S N c91dd9d0e4 SOLR-14090: fix delete-copy-field when source is dynamic field 2020-02-04 21:33:31 +05:30
Munendra S N 4eff9c9b5e SOLR-10567: add support for DateRangeField in JSON facet range 2020-02-04 21:26:40 +05:30
Erick Erickson b0bb299dc4
LUCENE-9134: Port ant-regenerate tasks to Gradle build (#1230)
LUCENE-9134: Port ant-regenerate tasks to Gradle build (Solr javacc)
2020-02-04 09:16:38 -05:00
Ignacio Vera 641680fbf1
LUCENE-9197: fix wrong implementation on Point2D#withinTriangle (#1228) 2020-02-04 07:10:08 +01:00
Erick Erickson d3ac1329a3
LUCENE-8656: Deprecations in FuzzyQuery (#1229)
LUCENE-8656: Deprecations in FuzzyQuery

Closes #1229
2020-02-03 08:52:33 -05:00
Mikhail Khludnev d8bc9bcfcf SOLR-12325: uniqueBlock(\{!v=foo:bar}) 2020-02-02 15:15:35 +03:00
Jan Høydahl 16b8d50284
SOLR-14221: Upgrade restlet to version 2.4.0 (#1211) 2020-02-02 11:35:14 +01:00
Kazuaki Hiraga b457c2ee2e LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option to control whether the tokenizer emits original tokens when the mode is not NORMAL. 2020-02-01 14:51:09 +09:00
Munendra S N a2c53dad72 fix typo in schema-api documentation 2020-02-01 10:21:52 +05:30
Erick Erickson 5253c0cb74
LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226)
LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226
2020-01-31 17:04:10 -05:00
Robert Muir 7382375d8a
support ECJ linting on newer JDK versions
The entire precommit task will still fail with unsupported java version
(subsequent checks do not support the newer javadocs format).

But this allows the ECJ linter to run, which checks for things such as
unused imports.
2020-01-31 14:16:04 -05:00
Joel Bernstein db78f6cd00 SOLR-14139: Support backtick phrase queries in Streaming Expressions 2020-01-31 11:54:14 -05:00
Christine Poerschke 0c1b19a321 LUCENE-8530: fix some 'rawtypes' javac warnings 2020-01-31 16:40:55 +00:00
Jason Gerlowski 719b38c8d8
SOLR-13892: Add 'top-level' docValues Join implementation (#1171) 2020-01-31 11:21:01 -05:00
Robert Muir 9ceaff913e
LUCENE-9195: more slow tests fixes 2020-01-31 07:57:34 -05:00
Robert Muir ed7f507c3c
LUCENE-9193: fix documentation typo for gradle tests 2020-01-30 23:54:31 -05:00
Chris Hostetter 517438e356 New /stream test cases showing authn+authz edge cases in cloud mode
This triggers various places in the Streaming Expressions code that use background threads
to confirm that the expected credentails (or lack of) are propogarded along.

Test currently has comments + workarounds for 2 known client issues:
 - SOLR-14226: SolrStream reports AuthN/AuthZ failures (401|403) as IOException w/o details
 - SOLR-14222: CloudSolrClient converts (update) 403 error to 500 error
2020-01-30 10:01:03 -07:00
Robert Muir 4b5105e167
LUCENE-9193: heap allocations for tests.profile
Can be a bit noisier than cpu sampling, due to how threads are allocated
in tests... maybe we can improve that in the future.
2020-01-30 08:29:10 -05:00
Dawid Weiss 3a8ed5e8ed LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter. 2020-01-30 13:48:16 +01:00
Dawid Weiss 043dd207b6 LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before. 2020-01-30 13:09:47 +01:00
Adrien Grand 13e2094804 LUCENE-4702: Improve performance for fuzzy queries.
Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix
length is 1 or 2. By not compressing those, we can trade very little space (a
couple MBs in the case of the wikibigall index) for better query efficiency.
2020-01-30 10:37:39 +01:00
Ignacio Vera a9482911a8
LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170) 2020-01-30 08:03:22 +01:00
Robert Muir 29469b454f
LUCENE-9192: speed up more slow tests 2020-01-29 14:31:32 -05:00
Ignacio Vera c98229948a
LUCENE-9152: Improve line intersection detection for polygons (#1187) 2020-01-29 19:24:51 +01:00
Dawid Weiss e25dac085f LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant). 2020-01-29 17:02:59 +01:00
Adrien Grand 7941d109bd SOLR-13897: Fix precommit. 2020-01-28 20:11:47 +01:00
Adrien Grand 92b684c647
LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197) 2020-01-28 19:06:53 +01:00
Adrien Grand 6eb8834a57
LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)
Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.
2020-01-28 18:38:30 +01:00
Robert Muir 4773574578
LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
The issue is that MockDirectoryWrapper's disk full check is horribly
inefficient. On every writeByte/etc, it totally recomputes disk space
across all files. This means it calls listAll() on the underlying
Directory (which sorts all the underlying files), then sums up fileLength()
for each of those files.

This leads to many pathological cases in the disk full tests... but the
number of tests impacted by this is minimal, and the logic is scary.
2020-01-28 12:24:31 -05:00
Robert Muir 3bcc97c8eb
LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase 2020-01-28 11:55:51 -05:00