Commit Graph

11754 Commits

Author SHA1 Message Date
Ignacio Vera 286d22717b
LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection (#1258) 2020-03-03 07:36:44 +01:00
Ignacio Vera c313365c5f
LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon (#1290)
Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly
2020-03-03 07:07:34 +01:00
Ignacio Vera b732ce7002
LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (#1280) 2020-03-03 06:44:35 +01:00
Michael Sokolov e308e53873 Add CHANGES entry for LUCENE-8962 2020-03-02 18:34:13 -05:00
msfroh f017ae465e
LUCENE-8962: Fix intermittent test failures (#1307)
1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
   commit (the one that should trigger the full merge) doesn't have any
   pending changes (which could occur if the last indexing thread
   commits at the end). We can fix that by adding one more document
   before that commit.
2. The previous implementation was throwing IOException if the commit
   thread gets interrupted while waiting for merges to complete. This
   violates IndexWriter's documented behavior of throwing
   ThreadInterruptedException.
2020-03-02 18:29:12 -05:00
Nicholas Knize a6e80d004d LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d 2020-03-02 16:06:17 -06:00
msfroh 043c5dff6f
LUCENE-8962: Add ability to selectively merge on commit (#1155)
* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments
2020-03-02 12:19:47 -05:00
Namgyu Kim b2dbd18f96
LUCENE-9253: Support custom dictionaries in KoreanTokenizer
Signed-off-by: Namgyu Kim <namgyu@apache.org>
2020-03-03 02:11:44 +09:00
Adrien Grand f4b5069c1f LUCENE-9247: Fix test failures with ExtraFS. 2020-03-02 08:59:13 +01:00
Ignacio Vera c653c04bb1
LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278) 2020-03-02 06:56:29 +01:00
Adrien Grand c929b65c81 LUCENE-9247: Exclude `write.lock` from files whose integrity is expected to be verified. 2020-02-29 08:46:16 +01:00
Adrien Grand 30944d3520 LUCENE-9247: Fix class visibility issue discovered when backporting. 2020-02-28 15:34:36 +01:00
Bruno Roustant 99af698107
LUCENE-9237: Faster UniformSplit intersect TermsEnum.
Closes #1270
2020-02-28 14:52:19 +01:00
Adrien Grand d10db0d98f LUCENE-9247: Add missing commit. 2020-02-28 14:48:44 +01:00
Adrien Grand cd984e2dc0
LUCENE-9247: Add tests for `checkIntegrity`. (#1284)
This adds a test to `BaseIndexFileFormatTestCase` that the combination
of opening a reader and calling `checkIntegrity` on it reads all bytes
of all files (including index headers and footers). This would help
detect most cases when `checkIntegrity` is not implemented correctly.
2020-02-28 14:19:56 +01:00
Adrien Grand ebdfdaed9f
LUCENE-9246: Remove `dOff` argument from `LZ4#decompress`. (#1283)
It is always set to 0 at call sites.
2020-02-28 11:26:56 +01:00
Bruno Roustant e0164d1ac8
LUCENE-9245: Reduce AutomatonTermsEnum memory usage.
Closes #1281
2020-02-28 10:42:06 +01:00
Ignacio Vera 988ce9bff7
LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. (#1289) 2020-02-28 10:22:27 +01:00
Michael Sokolov 294b8d4ee1 LUCENE-9202: refactor leaf collectors in TopFieldCollector 2020-02-27 08:05:49 -05:00
Cao Manh Dat 666bd493c8 SOLR-14286: Upgrade Jaegar to 1.1.0 2020-02-27 14:51:45 +07:00
Mike McCandless 61e0b4cd87 LUCENE-9252: fix javac linter warnings in spatial-extras (thanks Andras Salamon) 2020-02-26 09:34:58 -05:00
Alan Woodward 98dafe2e10
LUCENE-9207: Don't build span queries in QueryBuilder (#1239)
QueryBuilder currently has special logic for graph phrase queries with no slop,
constructing a spanquery that attempts to follow all paths using a combination of
OR and NEAR queries. However, this type of query has known bugs(LUCENE-7398).
This commit removes this logic and just builds a disjunction of phrase queries, one 
phrase per path.
2020-02-26 14:32:34 +00:00
Alan Woodward b4c2e279a9 LUCENE-9212: Fix precommit 2020-02-24 14:17:04 +00:00
Alan Woodward 19fe1eee68 LUCENE-9212: Remove deprecated Intervals.multiterm() methods 2020-02-24 11:16:37 +00:00
Alan Woodward ffb7cafe93 LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton 2020-02-24 11:08:48 +00:00
Alessandro Benedetti 663611c99c
[SOLR-12238] Synonym Queries boost (#357)
SOLR-12238: Handle boosts in QueryBuilder

QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream.  This commit also adds a DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr
2020-02-24 10:29:41 +00:00
Ignacio Vera 88dd1c3f3d
LUCENE-9238: Add new XYPointField, queries and sorting capabilities (#1272)
New XYPointField field and Queries for indexing, searching and sorting cartesian points.
2020-02-21 11:26:30 +01:00
Robert Muir 9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00
Nhat Nguyen a0b8f5c7c2 LUCENE-9228: Sort dvUpdates by terms before apply
With this change, we sort dvUpdates in the term order before applying if
they all update a single field to the same value. This optimization can
reduce the flush time by around 20% for the docValues update user cases.
2020-02-20 13:18:10 -05:00
iverase 054b3be627 LUCENE-8707: fix test bug. when bounding box if a triangle
is within a circle, the triangle is within the circle as well.
2020-02-19 18:21:03 +01:00
Ignacio Vera d48bafb299
LUCENE-8707: Add LatLonShape and XYShape distance query (#587) 2020-02-19 16:03:30 +01:00
markharwood 79a4a680e7 Test fix - new binary doc values test could use invalid values. 2020-02-19 09:14:14 +00:00
Robert Muir b9a569e7be
LUCENE-9230: explicitly call python version we want from builds
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).

For example in Fedora Linux:

https://fedoraproject.org/wiki/Changes/Python_means_Python3

For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.

Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
2020-02-18 18:58:17 -05:00
markharwood ce2959fe4c
LUCENE-9211 Add compression for Binary doc value fields (#1234)
Stores groups of 32 binary doc values in LZ4-compressed blocks.
2020-02-18 14:02:42 +00:00
Robert Muir 0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00
Ignacio Vera ebec456602
Return CELL_CROSSES_QUERY when point inside the triangle (#1259) 2020-02-14 17:06:33 +01:00
Ignacio Vera 4a54ffb553
LUCENE-9218: XYGeometries should expose values as floats (#1252) 2020-02-14 11:39:10 +01:00
Adrien Grand 5cbe58f22c
Add back assertions removed by LUCENE-9187. (#1236)
This time they would only apply to TestFastLZ4/TestHighLZ4 and avoid slowing
down all tests.
2020-02-14 10:37:06 +01:00
Erick Erickson f9357ab0d2
LUCENE-9134: Port ant-regenerate tasks to Gradle build (util and packed) (#1251)
* LUCENE-9134: Port ant-regenerate tasks to Gradle build
2020-02-11 18:56:11 -05:00
Ignacio Vera 87421d7231 LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE (#1246) 2020-02-10 11:50:08 +01:00
Robert Muir f41eabdc5f
LUCENE-8279: fix javadocs wrong header levels and accessibility issues
Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.

Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).

Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.
2020-02-08 10:00:00 -05:00
Robert Muir 69f26d099e
LUCENE-9213: fix documentation-lint (and finally precommit) to work on java 12 and 13
the "missing javadocs" checker needed tweaks to work with the format
changes of java 13.

As a followup we may investigate javadoc (maybe the new doclet api). It
has its own missing checks too now, but they are black vs white (either
fully documented or not checked), whereas this python tool allows us to
"improve", e.g. enforce that all classes have doc, even if all
methods do not yet.
2020-02-07 17:18:26 -05:00
Nicholas Knize 206a70e7b7 LUCENE-9149: Increase data dimension limit in BKD 2020-02-07 16:08:14 -06:00
Ignacio Vera 73dbf6d061
UCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry 2020-02-07 19:29:13 +01:00
Adrien Grand c0d1f30236 SOLR-12930: Exclude dev-docs from binary archive. 2020-02-07 10:37:43 +01:00
Robert Muir 860115e450
LUCENE-9209: revert changes to test html file, not intended 2020-02-06 22:40:40 -05:00
Robert Muir 0d339043e3
LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy
Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5
features are used, but unfortunately also some constructs that do not
exist in HTML5 are used as well.

Because of this, we have no checking of any html syntax. jtidy is
disabled because it works with html4. doclint is disabled because it
works with html5. our docs are neither.

javadoc "doclint" feature can efficiently check that the html isn't
crazy. we just have to fix really ancient removed/deprecated stuff
(such as use of tt tag).

This enables the html checking in both ant and gradle. The docs are
fixed via straightforward transformations.

One exception is table cellpadding, for this some helper CSS classes
were added to make the transition easier (since it must apply padding
to inner th/td, not possible inline). I added TODOs, we should clean
this up. Most problems look like they may have been generated from a
GUI or similar and not a human.
2020-02-06 22:30:52 -05:00
Mike abd282d258
LUCENE-9142 Refactor IntSet operations for determinize (#1184)
* LUCENE-9142 Refactor SortedIntSet for equality

Split SortedIntSet into a class heirarchy to make comparisons to
FrozenIntSet more meaningful. Use Arrays.equals for more efficient
comparison. Add tests for IntSet to verify correctness.
2020-02-06 12:16:45 -08:00
Adrien Grand 85dba7356f LUCENE-9147: Make sure temporary files get deleted on all code paths. 2020-02-06 17:13:28 +01:00
Robert Muir 7f4560c59a
LUCENE-9199: allow building javadocs on java 13+ 2020-02-06 10:39:41 -05:00