lucene

Commit Graph

Author	SHA1	Message	Date
Robert Muir	860115e450	LUCENE-9209: revert changes to test html file, not intended	2020-02-06 22:40:40 -05:00
Robert Muir	0d339043e3	LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5 features are used, but unfortunately also some constructs that do not exist in HTML5 are used as well. Because of this, we have no checking of any html syntax. jtidy is disabled because it works with html4. doclint is disabled because it works with html5. our docs are neither. javadoc "doclint" feature can efficiently check that the html isn't crazy. we just have to fix really ancient removed/deprecated stuff (such as use of tt tag). This enables the html checking in both ant and gradle. The docs are fixed via straightforward transformations. One exception is table cellpadding, for this some helper CSS classes were added to make the transition easier (since it must apply padding to inner th/td, not possible inline). I added TODOs, we should clean this up. Most problems look like they may have been generated from a GUI or similar and not a human.	2020-02-06 22:30:52 -05:00
Mike	abd282d258	LUCENE-9142 Refactor IntSet operations for determinize (#1184 ) * LUCENE-9142 Refactor SortedIntSet for equality Split SortedIntSet into a class heirarchy to make comparisons to FrozenIntSet more meaningful. Use Arrays.equals for more efficient comparison. Add tests for IntSet to verify correctness.	2020-02-06 12:16:45 -08:00
Tomoko Uchida	f3cd1dbde3	LUCENE-9077: Force locale en_US on Javadoc task (workaroud for JDK-8222793)	2020-02-07 01:36:45 +09:00
Adrien Grand	85dba7356f	LUCENE-9147: Make sure temporary files get deleted on all code paths.	2020-02-06 17:13:28 +01:00
Robert Muir	7f4560c59a	LUCENE-9199: allow building javadocs on java 13+	2020-02-06 10:39:41 -05:00
Alan Woodward	7c1ba1aebe	LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals (#1097 ) If you have repeating intervals in an ordered or unordered interval source, you currently get somewhat confusing behaviour: * `ORDERED(a, a, b)` will return an extra interval over just a b if it first matches a a b, meaning that you can get incorrect results if used in a `CONTAINING` filter - `CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y` * `UNORDERED(a, a)` will match on documents that just containg a single a. This commit adds a RepeatingIntervalsSource that correctly handles repeats within ordered and unordered sources. It also changes the way that gaps are calculated within ordered and unordered sources, by using a new width() method on IntervalIterator. The default implementation just returns end() - start() + 1, but RepeatingIntervalsSource instead returns the sum of the widths of its child iterators. This preserves maxgaps filtering on ordered and unordered sources that contain repeats. In order to correctly handle matches in this scenario, IntervalsSource#matches now always returns an explicit IntervalsMatchesIterator rather than a plain MatchesIterator, which adds gaps() and width() methods so that submatches can be combined in the same way that subiterators are. Extra checks have been added to checkIntervals() to ensure that the same intervals are returned by both iterator and matches, and a fix to DisjunctionIntervalIterator#matches() is also included - DisjunctionIntervalIterator minimizes its intervals, while MatchesUtils.disjunction does not, so there was a discrepancy between the two methods.	2020-02-06 14:44:47 +00:00
Adrien Grand	fdf5ade727	LUCENE-9147: Fix codec excludes.	2020-02-06 10:34:03 +01:00
Adrien Grand	1b882246d7	LUCENE-9147: Avoid reusing file names with FileSwitchDirectory or NRTCachingDirectory and IOContext randomization.	2020-02-06 08:27:33 +01:00
Robert Muir	63be99bf12	SOLR-14118: default embedded zookeeper port to localhost	2020-02-05 21:33:37 -05:00
Robert Muir	196ec5f4a8	LUCENE-9206: add forbidden api exclusion to new class	2020-02-05 20:30:18 -05:00
Marcus	bc5f837344	SOLR-14147 change the Security manager to default to true. (#1141 ) * change the Security manager to default. * update the ref-guide. * uncomment init scripts update changes. * changed the ref guide and re-commented file. * remove added comment. * modified shell script. * removed comment in windows file. Signed-off-by: marcussorealheis <marcuseagan@gmail.com> * bashism and fix windows * remove space Signed-off-by: marcussorealheis <marcuseagan@gmail.com>	2020-02-05 19:17:55 -05:00
Robert Muir	93b83f635d	LUCENE-9206: Improve IndexMergeTool defaults and options IndexMergeTool previously had no options and always forceMerge(1) the resulting index. This can result in wasted work and confusing performance (unbalancing the index). Instead the default is to not do anything, except merges from the merge policy.	2020-02-05 16:31:07 -05:00
Houston Putman	80ed8c281b	SOLR-13887: Use the default idleTimeout instead of 0 for HTTP2 (#991 )	2020-02-05 11:15:37 -08:00
Chris Hostetter	c5d0391df9	SOLR-14241: New delete() Stream Decorator	2020-02-05 10:49:24 -07:00
Adrien Grand	136dcbdbbc	LUCENE-9147: Move the stored fields index off-heap. (#1179 ) This replaces the index of stored fields and term vectors with two `DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number of values to write up-front, so incoming doc IDs and file pointers are buffered on disk using temporary files that never get fsynced, but have index headers and footers to make sure any corruption in these files wouldn't propagate to the index. `DirectMonotonicReader` gets a specialized `binarySearch` implementation that leverages the metadata in order to avoid going to the IndexInput as often as possible. Actually in the common case, it would only go to a single sub `DirectReader` which, combined with the size of blocks of 1k values, helps bound the number of page faults to 2.	2020-02-05 18:35:08 +01:00
Adrien Grand	fe349ddcf2	SOLR-14242: HdfsDirectory#createTempOutput. (#1240 )	2020-02-05 16:38:53 +01:00
Mike McCandless	47386f8cca	LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests	2020-02-05 09:51:31 -05:00
Adrien Grand	2d8428ec2e	SOLR-14238: Fix HdfsDirectory to no longer overwrite existing files. (#1237 )	2020-02-04 19:35:15 +01:00
Tomas Fernandez Lobbe	bb90569f1d	SOLR-14219: Revert changes in OverseerSolrRespose and move serialization (#1227 ) SOLR-14095 Introduced an issue for rolling restarts (Incompatible Java serialization). This change fixes the compatibility issue while keeping the functionality in SOLR-14095	2020-02-04 10:26:57 -08:00
Munendra S N	c91dd9d0e4	SOLR-14090: fix delete-copy-field when source is dynamic field	2020-02-04 21:33:31 +05:30
Munendra S N	4eff9c9b5e	SOLR-10567: add support for DateRangeField in JSON facet range	2020-02-04 21:26:40 +05:30
Erick Erickson	b0bb299dc4	LUCENE-9134: Port ant-regenerate tasks to Gradle build (#1230 ) LUCENE-9134: Port ant-regenerate tasks to Gradle build (Solr javacc)	2020-02-04 09:16:38 -05:00
Ignacio Vera	641680fbf1	LUCENE-9197: fix wrong implementation on Point2D#withinTriangle (#1228 )	2020-02-04 07:10:08 +01:00
Erick Erickson	d3ac1329a3	LUCENE-8656: Deprecations in FuzzyQuery (#1229 ) LUCENE-8656: Deprecations in FuzzyQuery Closes #1229	2020-02-03 08:52:33 -05:00
Mikhail Khludnev	d8bc9bcfcf	SOLR-12325: uniqueBlock(\{!v=foo:bar})	2020-02-02 15:15:35 +03:00
Jan Høydahl	16b8d50284	SOLR-14221: Upgrade restlet to version 2.4.0 (#1211 )	2020-02-02 11:35:14 +01:00
Kazuaki Hiraga	b457c2ee2e	LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option to control whether the tokenizer emits original tokens when the mode is not NORMAL.	2020-02-01 14:51:09 +09:00
Munendra S N	a2c53dad72	fix typo in schema-api documentation	2020-02-01 10:21:52 +05:30
Erick Erickson	5253c0cb74	LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226 ) LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226	2020-01-31 17:04:10 -05:00
Robert Muir	7382375d8a	support ECJ linting on newer JDK versions The entire precommit task will still fail with unsupported java version (subsequent checks do not support the newer javadocs format). But this allows the ECJ linter to run, which checks for things such as unused imports.	2020-01-31 14:16:04 -05:00
Joel Bernstein	db78f6cd00	SOLR-14139: Support backtick phrase queries in Streaming Expressions	2020-01-31 11:54:14 -05:00
Christine Poerschke	0c1b19a321	LUCENE-8530: fix some 'rawtypes' javac warnings	2020-01-31 16:40:55 +00:00
Jason Gerlowski	719b38c8d8	SOLR-13892: Add 'top-level' docValues Join implementation (#1171 )	2020-01-31 11:21:01 -05:00
Robert Muir	9ceaff913e	LUCENE-9195: more slow tests fixes	2020-01-31 07:57:34 -05:00
Robert Muir	ed7f507c3c	LUCENE-9193: fix documentation typo for gradle tests	2020-01-30 23:54:31 -05:00
Chris Hostetter	517438e356	New /stream test cases showing authn+authz edge cases in cloud mode This triggers various places in the Streaming Expressions code that use background threads to confirm that the expected credentails (or lack of) are propogarded along. Test currently has comments + workarounds for 2 known client issues: - SOLR-14226: SolrStream reports AuthN/AuthZ failures (401\|403) as IOException w/o details - SOLR-14222: CloudSolrClient converts (update) 403 error to 500 error	2020-01-30 10:01:03 -07:00
Robert Muir	4b5105e167	LUCENE-9193: heap allocations for tests.profile Can be a bit noisier than cpu sampling, due to how threads are allocated in tests... maybe we can improve that in the future.	2020-01-30 08:29:10 -05:00
Dawid Weiss	3a8ed5e8ed	LUCENE-9134: add python-based regeneration of HTMLCharacterEntities.jflex inside jflexHTMLStripCharFilter.	2020-01-30 13:48:16 +01:00
Dawid Weiss	043dd207b6	LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before.	2020-01-30 13:09:47 +01:00
Adrien Grand	13e2094804	LUCENE-4702: Improve performance for fuzzy queries. Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix length is 1 or 2. By not compressing those, we can trade very little space (a couple MBs in the case of the wikibigall index) for better query efficiency.	2020-01-30 10:37:39 +01:00
Ignacio Vera	a9482911a8	LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170 )	2020-01-30 08:03:22 +01:00
Robert Muir	29469b454f	LUCENE-9192: speed up more slow tests	2020-01-29 14:31:32 -05:00
Ignacio Vera	c98229948a	LUCENE-9152: Improve line intersection detection for polygons (#1187 )	2020-01-29 19:24:51 +01:00
Dawid Weiss	e25dac085f	LUCENE-9134: this adds initial javacc support (without follow-up tweaks required to make the sources identical as those generated by ant).	2020-01-29 17:02:59 +01:00
Adrien Grand	7941d109bd	SOLR-13897: Fix precommit.	2020-01-28 20:11:47 +01:00
Adrien Grand	92b684c647	LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197 )	2020-01-28 19:06:53 +01:00
Adrien Grand	6eb8834a57	LUCENE-4702: Reduce terms dictionary compression overhead. (#1216 ) Changes include: - Removed LZ4 compression of suffix lengths which didn't save much space anyway. - For stats, LZ4 was only really used for run-length compression of terms whose docFreq is 1. This has been replaced by explicit run-length compression. - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we now only try LZ4 out if the average suffix length is greater than 6, in order to reduce index-time overhead.	2020-01-28 18:38:30 +01:00
Robert Muir	4773574578	LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes The issue is that MockDirectoryWrapper's disk full check is horribly inefficient. On every writeByte/etc, it totally recomputes disk space across all files. This means it calls listAll() on the underlying Directory (which sorts all the underlying files), then sums up fileLength() for each of those files. This leads to many pathological cases in the disk full tests... but the number of tests impacted by this is minimal, and the logic is scary.	2020-01-28 12:24:31 -05:00
Robert Muir	3bcc97c8eb	LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase	2020-01-28 11:55:51 -05:00

... 6 7 8 9 10 ...

33538 Commits All Branches Search

33538 Commits

All Branches