Commit Graph

12029 Commits

Author SHA1 Message Date
iverase 9340e56551 Add back-compat indices for 8.5.1 2020-04-16 09:52:08 +02:00
iverase b7b85f3e75 Move bugfix entries to version 8.5.1 2020-04-16 09:36:55 +02:00
iverase 8a88ab0e7c Add bugfix version 8.5.1 2020-04-16 09:27:06 +02:00
Adrien Grand 0aa4ba7ccb
LUCENE-9260: Verify checksums of CFS files. (#1311) 2020-04-15 15:10:59 +02:00
Adrien Grand aa605b3c70
LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput (#1415) 2020-04-15 15:10:11 +02:00
Simon Willnauer 47bc18478a
Move DWPT private deletes out of FrozenBufferedUpdates (#1431)
This change moves the deletes tracked by FrozenBufferedUpdates that
are private to the DWPT and never used in a global context out of
FrozenBufferedUpdates.
2020-04-14 21:37:19 +02:00
Simon Willnauer 18af6325ed
LUCENE-9304: Fix IW#getMaxCompletedSequenceNumber() (#1427)
After recent refactoring on LUCENE-9304 `IW#getMaxCompletedSequenceNumber()` might
return values that belong to non-completed operations if a full flush is running, a new delete
queue is already in place but not all DWPTs that participate in the full flush have finished it's in
flight operation. This caused rare failures in
`TestControlledRealTimeReopenThread#testControlledRealTimeReopenThread` where
documents are not actually visible given the max completed seqNo. This change streamlines
the delete queue advance, adds a dedicated testcase and ensures that a delete queues
sequence Id space is never exhausted.
2020-04-14 19:39:23 +02:00
Julie Tibshirani 3236d38c8b
Avoid using a raw Arc type. (#1429)
This fixes some compiler warnings that popped up recently.
2020-04-14 09:23:12 +02:00
Simon Willnauer f5457b82a1 Suppress Direct postings for TestIndexWriterThreadsToSegments to prevent OOM on Nightly 2020-04-13 13:44:15 +02:00
Dawid Weiss 616ec987a9
Do a bit count on 8 bytes from a long directly instead of reading 8 bytes from the reader. Byte order doesn't matter here. (#1426) 2020-04-13 13:37:25 +02:00
Shalin Shekhar Mangar 13f19f6555 SOLR-9906: SolrjNamedThreadFactory is deprecated in favor of SolrNamedThreadFactory. DefaultSolrThreadFactory is removed from solr-core in favor of SolrNamedThreadFactory in solrj package and all solr-core classes now use SolrNamedThreadFactory 2020-04-13 08:16:35 +05:30
Simon Willnauer 8c1f9815db LUCENE-9309: ensure stopMerges is set under IW lock 2020-04-11 19:53:21 +02:00
Simon Willnauer 2602269f3e
LUCENE-9304: Refactor DWPTPool to pool DWPT directly (#1397)
This change removes the ThreadState indirection from DWPTPool and pools DWPT directly. The tracking information and locking semantics are mostly moved to DWPT directly and the pool semantics have changed slightly such that DWPT need to be checked-out in the pool once they need to be flushed or aborted. This automatically grows and shrinks the number of DWPT in the system when number of threads grow or shrink. Access of pooled DWPTs is more straight forward and doesn't require ordinal. Instead consumers can just iterate over the elements in the pool.
This allowed for removal of indirections in DWPTFlushControl like BlockedFlush, the removal of DWPTPool setter and getter in IndexWriterConfig and the addition of stronger assertions in DWPT and DW.
2020-04-11 12:23:46 +02:00
Nhat Nguyen 527e651660 LUCENE-9298: Fix TestBufferedUpdates
This test failed on Elastic CI because we did not add any term in the
loop. This commit ensures that we always add at least one docId, term
and query in the test.
2020-04-10 15:28:10 -04:00
Simon Willnauer e376582e25
LUCENE-9309: Wait for #addIndexes merges when aborting merges (#1418)
The SegmentMerger usage in IW#addIndexes(CodecReader...) might make changes
to the Directory while the IW tries to clean-up files on rollback. This
causes issues like FileNotFoundExceptions when IDF tries to remove temp files.
This changes adds a waiting mechanism to the abortMerges method that, in addition
to the running merges, also waits for merges in addIndices(CodecReader...)
2020-04-10 12:55:02 +02:00
YuBinglei 2935186c5b
LUCENE-9298: Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared (#1389) 2020-04-10 12:30:47 +02:00
Bruno Roustant 6bba35a709
LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster. 2020-04-09 10:36:37 +02:00
Juan Camilo Rodriguez Duran de6233976a LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values.
Closes #1408
2020-04-07 16:12:05 -04:00
Adrien Grand 529042e786 LUCENE-9271: Complete fix for setBufferSize. 2020-04-07 17:24:41 +02:00
Adrien Grand 3363e1aa48 LUCENE-9271: Fix bad assertion. 2020-04-07 16:21:33 +02:00
Adrien Grand 82692e76e0 LUCENE-9271: Move BufferedIndexInput to the ByteBuffer API.
Closes #1338
2020-04-07 13:30:09 +02:00
Ignacio Vera f018c4c813
LUCENE-9244: In 2D, a point can be shared by four leaves (#1279)
Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point is shared by multiple leaves
2020-04-07 10:41:15 +02:00
Erick Erickson e1e2085e94 SOLR-14386: Update Jetty to 9.4.27 and dropwizard-metrics version to 4.1.5 2020-04-04 16:14:57 -04:00
Jim Ferenczi b5c5ebe37c
LUCENE-9300: Fix field infos update on doc values update (#1394)
Today a doc values update creates a new field infos file that contains the original field infos updated for the new generation as well as the new fields created by the doc values update.

However existing fields are cloned through the global fields (shared in the index writer) instead of the local ones (present in the segment).
In practice this is not an issue since field numbers are shared between segments created by the same index writer.
But this assumption doesn't hold for segments created by different writers and added through IndexWriter#addIndexes(Directory).
In this case, the field number of the same field can differ between segments so any doc values update can corrupt the index
by assigning the wrong field number to an existing field in the next generation.

When this happens, queries and merges can access wrong fields without throwing any error, leading to a silent corruption in the index.

This change ensures that we preserve local field numbers when creating
a new field infos generation.
2020-04-03 13:58:05 +02:00
Atri Sharma d6cef4f39c Update CHANGES.txt 2020-04-01 20:56:19 +05:30
Atri Sharma 9ed71a6efe
LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches (#1294)
This commit introduces a mechanism to control allocation of threads to slices planned for a query.
The default implementation uses the size of backlog queue of the executor to determine if a slice should be allocated a new thread
2020-04-01 20:42:26 +05:30
Mike Drob 46d011645c LUCENE-9170: Use HTTPS when downloading wagon-ssh artifacts
Co-authored-by: Ishan Chattopadhyaya <ishan@apache.org>
2020-04-01 11:40:58 +01:00
Erick Erickson 5c2011a6fb SOLR-14367: Upgrade Tika to 1.24 2020-03-29 08:48:00 -04:00
Marvin Justice 84f6507452 LUCENE-9133 Fix for potential NPE in TermFilteredPresearcher#buildQuery 2020-03-27 16:41:01 -05:00
Ignacio Vera 8cb50a52bc LUCENE-9290 Fix TestXYPoint#testEqualsAndHashCode
closes #1375
2020-03-27 12:51:35 -05:00
Michael Sokolov 075adac598 remove LUCENE-8962 from CHANGES.txt 2020-03-25 14:34:45 -04:00
Uwe Schindler 2c7a710945
LUCENE-9281: Retire SPIClassIterator from master because Java 9+ uses different mechanism to load services when module system is used (#1360)
LUCENE-9281: Use java.util.ServiceLoader to load codec components and analysis factories to be compatible with Java Module System
2020-03-25 18:03:36 +01:00
Alan Woodward ad75916b6b LUCENE-9283: Also exclude DelimitedBoostTokenFilter from TestFactories 2020-03-25 10:04:38 +00:00
Ignacio Vera 674aba6a85
LUCENE-9287: UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery (#1374) 2020-03-24 15:26:56 +01:00
Alan Woodward 20abf3e478 Add 8.5.0 back-compat indices 2020-03-24 12:38:49 +00:00
Ignacio Vera aaf08c9c4d
LUCENE-9275: make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries (#1345) 2020-03-23 07:26:48 +01:00
David Smiley 62967039dc
ivy settings: local maven repo pattern needs classifier (#1367)
note: use of this is still commented out
2020-03-21 11:51:49 -04:00
Alan Woodward 126e4a61b8 LUCENE-9283: Exclude DelimitedBoostTokenFilter from TestRandomChains 2020-03-19 13:12:04 +00:00
Munendra S N 6a59d443bc LUCENE-8908: return def val from objectVal when exists returns false
* This behavior is similar to floatVal in QueryValueSource
2020-03-18 10:41:13 +05:30
Dawid Weiss bf25e6566d LUCENE-9279: add changes entry and attribution. 2020-03-17 22:06:01 +01:00
Simon Willnauer bdb40fb164
Cleanup DWPT for readability (#1350)
DWPT had some complicated logic to account for failures etc.
This change cleans up this logic and simplifies the document processing
loop
2020-03-17 15:18:38 +01:00
Dawid Weiss 7fe6f9c57d LUCENE-9279: Update dictionary version for Ukrainian analyzer (with corrected checksums). 2020-03-16 21:47:14 +01:00
Dawid Weiss 1abed9ab22 Revert "LUCENE-9279: Update dictionary version for Ukrainian analyzer (#1354)"
This reverts commit 73b618a55c.
2020-03-16 21:19:05 +01:00
David Smiley 261e7ba86c LUCENE-8103: Revert QueryValueSource.objectVal change 2020-03-16 00:27:04 -04:00
erick 6c1d992fad SOLR-14312: SOLR-14296: Upgrade Zookeeper to 3.5.7, Update netty to 4.1.47 2020-03-15 22:11:49 -04:00
arysin 73b618a55c
LUCENE-9279: Update dictionary version for Ukrainian analyzer (#1354) 2020-03-15 22:17:05 +01:00
Michele Palmia 87b1bddf1c LUCENE-8103: Use TwoPhaseIterator in DoubleValuesSource and QueryValueSource
Fixes #1343
2020-03-15 11:50:45 -04:00
Simon Willnauer bd16620706 LUCENE-9164: fix changes entry 2020-03-15 09:04:52 +01:00
Simon Willnauer c0cf7bb4b0
LUCENE-9276: Use same code-path for updateDocuments and updateDocument (#1346)
Today we have a large amount of duplicated code that is rather of
complex nature. This change consolidates the code-paths to always
use the updateDocuments path.
2020-03-13 20:33:15 +01:00
Alan Woodward 8a940e7971 LUCENE-9171: Add CHANGES entry
SOLR-12238: Add CHANGES entry
2020-03-12 09:21:14 +00:00
Michele Palmia b1ec1cd9e0
LUCENE-9258: DocTermsIndexDocValues' range scorer didn't support multi-valued fields 2020-03-11 16:57:47 -04:00
Michele Palmia 5286098ac5
LUCENE-8849: DocValuesRewriteMethod.visit should visit subquery 2020-03-11 16:49:37 -04:00
Adrien Grand e43f8572cb LUCENE-9272: Add a CHANGES entry. 2020-03-11 20:03:15 +01:00
Namgyu Kim f0a49738ca
LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder 2020-03-12 02:50:36 +09:00
Adrien Grand ed59c3eb33
LUCENE-9272: Move checksum verification of the `.tip` file to `checkIntegrity()`. (#1339) 2020-03-11 18:15:29 +01:00
Simon Willnauer 79feb93bd9
LUCENE-9164: process all events before closing gracefully (#1319)
IndexWriter must process all pending events before closing the writer during rollback to prevent AlreadyClosedExceptions from being thrown during event processing which can cause the writer to be closed with a tragic event.
2020-03-10 20:40:20 +01:00
iverase 8a90806fa6 move entry in CHANGES.txt from 8.6 to 8.5 2020-03-10 10:43:44 +01:00
Simon Willnauer 44bdfb2a07
Consolidated process event logic after CRUD action (#1325)
Today we have duplicated logic on how to convert a seqNo into a real
seqNo and process events based on this. This change consolidated the logic
into a single method.
2020-03-09 18:47:43 +01:00
Tomoko Uchida c8dea5d77f LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option 2020-03-10 01:25:14 +09:00
Ignacio Vera 03c2557681
LUCENE-9263: Fix wrong transformation of distance in meters to radians in Geo3DPoint (#1318) 2020-03-09 17:07:55 +01:00
Bruno Roustant c7cf9e8e4f
LUCENE-9254: UniformSplit supports FST off-heap.
Closes #1301
2020-03-09 16:35:42 +01:00
Nhat Nguyen 7b9f212907 LUCENE-9268: Add some random tests to IndexWriter
Add some tests that perform a set of operations randomly and
concurrently on IndexWriter.
2020-03-08 22:18:04 -04:00
Michael Sokolov 4501b3d3fd Revert "LUCENE-8962: Split test case (#1313)"
This reverts commit 90aced5a51.

Revert "LUCENE-8962: woops, remove leftover accidental copyright (darned IDEs)"

This reverts commit 3dbfd10279.

Revert "LUCENE-8962: Fix intermittent test failures"

This reverts commit a5475de57f.

Revert "LUCENE-8962: Add ability to selectively merge on commit (#1155)"

This reverts commit a1791e7714.
2020-03-08 18:27:54 -04:00
Paul Pazderski 320578274b LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option 2020-03-07 21:32:40 +09:00
David Smiley 0c261f4215 CHANGES.txt: 8.5: re-categorize issues 2020-03-06 21:02:52 -05:00
Bruno Roustant c73d2c15ba
LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. 2020-03-06 14:24:12 +01:00
Robert Muir 4360fa7506
add 8.6 section to master branch's MERGE-CONFLICTS.txt for consistency 2020-03-06 05:52:26 -05:00
Robert Muir 624f5a3c2f
LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory
Closes #1321
2020-03-06 05:42:22 -05:00
Bruno Roustant 9733643466
LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.
Closes #1320
2020-03-06 11:15:09 +01:00
Robert Muir 9cfdf17b28
LUCENE-9241: fix tests to pass with -Xmx128m 2020-03-05 21:29:39 -05:00
Mike McCandless e5be034df2 LUCENE-8962: woops, remove leftover accidental copyright (darned IDEs) 2020-03-05 19:04:24 -05:00
Yannick Welsch 8a88dd02c6 Remove SimpleFSDirectory in favor of NIOFSDirectory 2020-03-06 00:04:25 +01:00
Michael Sokolov a030207a5e
LUCENE-8962: Split test case (#1313)
* LUCENE-8962: Simplify test case

The testMergeOnCommit test case was trying to verify too many things
at once: basic semantics of merge on commit and proper behavior when
a bunch of indexing threads are writing and committing all at once.

Now we just verify basic behavior, with strict assertions on invariants, while 
leaving it to MockRandomMergePolicy to enable merge on commit in existing
 test cases to verify that indexing generally works as expected and no new
unexpected exceptions are thrown.

* LUCENE-8962: Only update toCommit if merge was committed

The code was previously assuming that if mergeFinished() was called and
isAborted() was false, then the merge must have completed successfully.
Instead, we should know for sure if a given merge was committed, and
only then update our pending commit SegmentInfos.
2020-03-05 15:49:26 -05:00
Atri Sharma 5d605102d8 Update CHANGES.txt Entry for 9114 2020-03-05 09:23:50 +05:30
Atri Sharma d751cf626e
LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (#1303)
This commit makes ValueSourceScorer's costing algorithm also take the delegated FunctionValues's cost into consideration when calculating its cost. FunctionValues now exposes a cost method which is used by ValueSourceScorer's default matchCost method. In addition, ValueSourceScorer exposes a matchCost method which can be overridden to specify a custom costing mechanism
2020-03-05 09:16:50 +05:30
Ignacio Vera 286d22717b
LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection (#1258) 2020-03-03 07:36:44 +01:00
Ignacio Vera c313365c5f
LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon (#1290)
Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly
2020-03-03 07:07:34 +01:00
Ignacio Vera b732ce7002
LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (#1280) 2020-03-03 06:44:35 +01:00
Michael Sokolov e308e53873 Add CHANGES entry for LUCENE-8962 2020-03-02 18:34:13 -05:00
msfroh f017ae465e
LUCENE-8962: Fix intermittent test failures (#1307)
1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
   commit (the one that should trigger the full merge) doesn't have any
   pending changes (which could occur if the last indexing thread
   commits at the end). We can fix that by adding one more document
   before that commit.
2. The previous implementation was throwing IOException if the commit
   thread gets interrupted while waiting for merges to complete. This
   violates IndexWriter's documented behavior of throwing
   ThreadInterruptedException.
2020-03-02 18:29:12 -05:00
Nicholas Knize a6e80d004d LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d 2020-03-02 16:06:17 -06:00
msfroh 043c5dff6f
LUCENE-8962: Add ability to selectively merge on commit (#1155)
* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments
2020-03-02 12:19:47 -05:00
Namgyu Kim b2dbd18f96
LUCENE-9253: Support custom dictionaries in KoreanTokenizer
Signed-off-by: Namgyu Kim <namgyu@apache.org>
2020-03-03 02:11:44 +09:00
Adrien Grand f4b5069c1f LUCENE-9247: Fix test failures with ExtraFS. 2020-03-02 08:59:13 +01:00
Ignacio Vera c653c04bb1
LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278) 2020-03-02 06:56:29 +01:00
Adrien Grand c929b65c81 LUCENE-9247: Exclude `write.lock` from files whose integrity is expected to be verified. 2020-02-29 08:46:16 +01:00
Adrien Grand 30944d3520 LUCENE-9247: Fix class visibility issue discovered when backporting. 2020-02-28 15:34:36 +01:00
Bruno Roustant 99af698107
LUCENE-9237: Faster UniformSplit intersect TermsEnum.
Closes #1270
2020-02-28 14:52:19 +01:00
Adrien Grand d10db0d98f LUCENE-9247: Add missing commit. 2020-02-28 14:48:44 +01:00
Adrien Grand cd984e2dc0
LUCENE-9247: Add tests for `checkIntegrity`. (#1284)
This adds a test to `BaseIndexFileFormatTestCase` that the combination
of opening a reader and calling `checkIntegrity` on it reads all bytes
of all files (including index headers and footers). This would help
detect most cases when `checkIntegrity` is not implemented correctly.
2020-02-28 14:19:56 +01:00
Adrien Grand ebdfdaed9f
LUCENE-9246: Remove `dOff` argument from `LZ4#decompress`. (#1283)
It is always set to 0 at call sites.
2020-02-28 11:26:56 +01:00
Bruno Roustant e0164d1ac8
LUCENE-9245: Reduce AutomatonTermsEnum memory usage.
Closes #1281
2020-02-28 10:42:06 +01:00
Ignacio Vera 988ce9bff7
LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. (#1289) 2020-02-28 10:22:27 +01:00
Michael Sokolov 294b8d4ee1 LUCENE-9202: refactor leaf collectors in TopFieldCollector 2020-02-27 08:05:49 -05:00
Cao Manh Dat 666bd493c8 SOLR-14286: Upgrade Jaegar to 1.1.0 2020-02-27 14:51:45 +07:00
Mike McCandless 61e0b4cd87 LUCENE-9252: fix javac linter warnings in spatial-extras (thanks Andras Salamon) 2020-02-26 09:34:58 -05:00
Alan Woodward 98dafe2e10
LUCENE-9207: Don't build span queries in QueryBuilder (#1239)
QueryBuilder currently has special logic for graph phrase queries with no slop,
constructing a spanquery that attempts to follow all paths using a combination of
OR and NEAR queries. However, this type of query has known bugs(LUCENE-7398).
This commit removes this logic and just builds a disjunction of phrase queries, one 
phrase per path.
2020-02-26 14:32:34 +00:00
Alan Woodward b4c2e279a9 LUCENE-9212: Fix precommit 2020-02-24 14:17:04 +00:00
Alan Woodward 19fe1eee68 LUCENE-9212: Remove deprecated Intervals.multiterm() methods 2020-02-24 11:16:37 +00:00
Alan Woodward ffb7cafe93 LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton 2020-02-24 11:08:48 +00:00
Alessandro Benedetti 663611c99c
[SOLR-12238] Synonym Queries boost (#357)
SOLR-12238: Handle boosts in QueryBuilder

QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream.  This commit also adds a DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr
2020-02-24 10:29:41 +00:00
Ignacio Vera 88dd1c3f3d
LUCENE-9238: Add new XYPointField, queries and sorting capabilities (#1272)
New XYPointField field and Queries for indexing, searching and sorting cartesian points.
2020-02-21 11:26:30 +01:00
Robert Muir 9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00
Nhat Nguyen a0b8f5c7c2 LUCENE-9228: Sort dvUpdates by terms before apply
With this change, we sort dvUpdates in the term order before applying if
they all update a single field to the same value. This optimization can
reduce the flush time by around 20% for the docValues update user cases.
2020-02-20 13:18:10 -05:00
iverase 054b3be627 LUCENE-8707: fix test bug. when bounding box if a triangle
is within a circle, the triangle is within the circle as well.
2020-02-19 18:21:03 +01:00
Ignacio Vera d48bafb299
LUCENE-8707: Add LatLonShape and XYShape distance query (#587) 2020-02-19 16:03:30 +01:00
markharwood 79a4a680e7 Test fix - new binary doc values test could use invalid values. 2020-02-19 09:14:14 +00:00
Robert Muir b9a569e7be
LUCENE-9230: explicitly call python version we want from builds
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).

For example in Fedora Linux:

https://fedoraproject.org/wiki/Changes/Python_means_Python3

For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.

Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
2020-02-18 18:58:17 -05:00
markharwood ce2959fe4c
LUCENE-9211 Add compression for Binary doc value fields (#1234)
Stores groups of 32 binary doc values in LZ4-compressed blocks.
2020-02-18 14:02:42 +00:00
Robert Muir 0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00
Ignacio Vera ebec456602
Return CELL_CROSSES_QUERY when point inside the triangle (#1259) 2020-02-14 17:06:33 +01:00
Ignacio Vera 4a54ffb553
LUCENE-9218: XYGeometries should expose values as floats (#1252) 2020-02-14 11:39:10 +01:00
Adrien Grand 5cbe58f22c
Add back assertions removed by LUCENE-9187. (#1236)
This time they would only apply to TestFastLZ4/TestHighLZ4 and avoid slowing
down all tests.
2020-02-14 10:37:06 +01:00
Erick Erickson f9357ab0d2
LUCENE-9134: Port ant-regenerate tasks to Gradle build (util and packed) (#1251)
* LUCENE-9134: Port ant-regenerate tasks to Gradle build
2020-02-11 18:56:11 -05:00
Ignacio Vera 87421d7231 LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE (#1246) 2020-02-10 11:50:08 +01:00
Robert Muir f41eabdc5f
LUCENE-8279: fix javadocs wrong header levels and accessibility issues
Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.

Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).

Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.
2020-02-08 10:00:00 -05:00
Robert Muir 69f26d099e
LUCENE-9213: fix documentation-lint (and finally precommit) to work on java 12 and 13
the "missing javadocs" checker needed tweaks to work with the format
changes of java 13.

As a followup we may investigate javadoc (maybe the new doclet api). It
has its own missing checks too now, but they are black vs white (either
fully documented or not checked), whereas this python tool allows us to
"improve", e.g. enforce that all classes have doc, even if all
methods do not yet.
2020-02-07 17:18:26 -05:00
Nicholas Knize 206a70e7b7 LUCENE-9149: Increase data dimension limit in BKD 2020-02-07 16:08:14 -06:00
Ignacio Vera 73dbf6d061
UCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry 2020-02-07 19:29:13 +01:00
Adrien Grand c0d1f30236 SOLR-12930: Exclude dev-docs from binary archive. 2020-02-07 10:37:43 +01:00
Robert Muir 860115e450
LUCENE-9209: revert changes to test html file, not intended 2020-02-06 22:40:40 -05:00
Robert Muir 0d339043e3
LUCENE-9209: fix javadocs to be html5, enable doclint html checks, remove jtidy
Current javadocs declare an HTML5 doctype: !DOCTYPE HTML. Some HTML5
features are used, but unfortunately also some constructs that do not
exist in HTML5 are used as well.

Because of this, we have no checking of any html syntax. jtidy is
disabled because it works with html4. doclint is disabled because it
works with html5. our docs are neither.

javadoc "doclint" feature can efficiently check that the html isn't
crazy. we just have to fix really ancient removed/deprecated stuff
(such as use of tt tag).

This enables the html checking in both ant and gradle. The docs are
fixed via straightforward transformations.

One exception is table cellpadding, for this some helper CSS classes
were added to make the transition easier (since it must apply padding
to inner th/td, not possible inline). I added TODOs, we should clean
this up. Most problems look like they may have been generated from a
GUI or similar and not a human.
2020-02-06 22:30:52 -05:00
Mike abd282d258
LUCENE-9142 Refactor IntSet operations for determinize (#1184)
* LUCENE-9142 Refactor SortedIntSet for equality

Split SortedIntSet into a class heirarchy to make comparisons to
FrozenIntSet more meaningful. Use Arrays.equals for more efficient
comparison. Add tests for IntSet to verify correctness.
2020-02-06 12:16:45 -08:00
Adrien Grand 85dba7356f LUCENE-9147: Make sure temporary files get deleted on all code paths. 2020-02-06 17:13:28 +01:00
Robert Muir 7f4560c59a
LUCENE-9199: allow building javadocs on java 13+ 2020-02-06 10:39:41 -05:00
Alan Woodward 7c1ba1aebe
LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals (#1097)
If you have repeating intervals in an ordered or unordered interval source, you currently 
get somewhat confusing behaviour:

* `ORDERED(a, a, b)` will return an extra interval over just a b if it first matches a a b, meaning
that you can get incorrect results if used in a `CONTAINING` filter - 
`CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y`
* `UNORDERED(a, a)` will match on documents that just containg a single a.

This commit adds a RepeatingIntervalsSource that correctly handles repeats within 
ordered and unordered sources. It also changes the way that gaps are calculated within 
ordered and unordered sources, by using a new width() method on IntervalIterator. The 
default implementation just returns end() - start() + 1, but RepeatingIntervalsSource 
instead returns the sum of the widths of its child iterators. This preserves maxgaps filtering 
on ordered and unordered sources that contain repeats.

In order to correctly handle matches in this scenario, IntervalsSource#matches now always 
returns an explicit IntervalsMatchesIterator rather than a plain MatchesIterator, which adds 
gaps() and width() methods so that submatches can be combined in the same way that 
subiterators are. Extra checks have been added to checkIntervals() to ensure that the same 
intervals are returned by both iterator and matches, and a fix to 
DisjunctionIntervalIterator#matches() is also included - DisjunctionIntervalIterator minimizes 
its intervals, while MatchesUtils.disjunction does not, so there was a discrepancy between 
the two methods.
2020-02-06 14:44:47 +00:00
Adrien Grand fdf5ade727 LUCENE-9147: Fix codec excludes. 2020-02-06 10:34:03 +01:00
Adrien Grand 1b882246d7 LUCENE-9147: Avoid reusing file names with FileSwitchDirectory or NRTCachingDirectory and IOContext randomization. 2020-02-06 08:27:33 +01:00
Robert Muir 196ec5f4a8
LUCENE-9206: add forbidden api exclusion to new class 2020-02-05 20:30:18 -05:00
Robert Muir 93b83f635d
LUCENE-9206: Improve IndexMergeTool defaults and options
IndexMergeTool previously had no options and always forceMerge(1)
the resulting index. This can result in wasted work and confusing
performance (unbalancing the index).

Instead the default is to not do anything, except merges from the
merge policy.
2020-02-05 16:31:07 -05:00
Adrien Grand 136dcbdbbc
LUCENE-9147: Move the stored fields index off-heap. (#1179)
This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.
2020-02-05 18:35:08 +01:00
Mike McCandless 47386f8cca LUCENE-9200: consistently use double (not float) math for TieredMergePolicy's decisions, to fix a corner-case bug uncovered by randomized tests 2020-02-05 09:51:31 -05:00
Ignacio Vera 641680fbf1
LUCENE-9197: fix wrong implementation on Point2D#withinTriangle (#1228) 2020-02-04 07:10:08 +01:00
Erick Erickson d3ac1329a3
LUCENE-8656: Deprecations in FuzzyQuery (#1229)
LUCENE-8656: Deprecations in FuzzyQuery

Closes #1229
2020-02-03 08:52:33 -05:00
Jan Høydahl 16b8d50284
SOLR-14221: Upgrade restlet to version 2.4.0 (#1211) 2020-02-02 11:35:14 +01:00
Kazuaki Hiraga b457c2ee2e LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option to control whether the tokenizer emits original tokens when the mode is not NORMAL. 2020-02-01 14:51:09 +09:00
Erick Erickson 5253c0cb74
LUCENE-9134 Port ant-regenerate tasks to Gradle build (#1226)
LUCENE-9134: Port ant-regenerate tasks to Gradle build Javacc sub-task. Closes #1226
2020-01-31 17:04:10 -05:00
Robert Muir 7382375d8a
support ECJ linting on newer JDK versions
The entire precommit task will still fail with unsupported java version
(subsequent checks do not support the newer javadocs format).

But this allows the ECJ linter to run, which checks for things such as
unused imports.
2020-01-31 14:16:04 -05:00
Christine Poerschke 0c1b19a321 LUCENE-8530: fix some 'rawtypes' javac warnings 2020-01-31 16:40:55 +00:00
Robert Muir 9ceaff913e
LUCENE-9195: more slow tests fixes 2020-01-31 07:57:34 -05:00
Dawid Weiss 043dd207b6 LUCENE-9080: this jflex file got corrupted somehow during previous commit. I regenerated it with ant, along with the final java file. I also added a crlf normalization, encoding and forced-regeneration to ant because it didn't work before. 2020-01-30 13:09:47 +01:00
Adrien Grand 13e2094804 LUCENE-4702: Improve performance for fuzzy queries.
Fuzzy queries with an edit distance of 1 or 2 must visit all blocks whose prefix
length is 1 or 2. By not compressing those, we can trade very little space (a
couple MBs in the case of the wikibigall index) for better query efficiency.
2020-01-30 10:37:39 +01:00
Ignacio Vera a9482911a8
LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. (#1170) 2020-01-30 08:03:22 +01:00
Robert Muir 29469b454f
LUCENE-9192: speed up more slow tests 2020-01-29 14:31:32 -05:00
Ignacio Vera c98229948a
LUCENE-9152: Improve line intersection detection for polygons (#1187) 2020-01-29 19:24:51 +01:00
Adrien Grand 92b684c647
LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197) 2020-01-28 19:06:53 +01:00
Adrien Grand 6eb8834a57
LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)
Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.
2020-01-28 18:38:30 +01:00
Robert Muir 4773574578
LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
The issue is that MockDirectoryWrapper's disk full check is horribly
inefficient. On every writeByte/etc, it totally recomputes disk space
across all files. This means it calls listAll() on the underlying
Directory (which sorts all the underlying files), then sums up fileLength()
for each of those files.

This leads to many pathological cases in the disk full tests... but the
number of tests impacted by this is minimal, and the logic is scary.
2020-01-28 12:24:31 -05:00
Robert Muir 3bcc97c8eb
LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase 2020-01-28 11:55:51 -05:00
Robert Muir 4350efa932
LUCENE-9187: remove too-expensive assert from LZ4 HighCompressionHashTable 2020-01-28 11:45:43 -05:00
Adrien Grand 9e4c445d17 LUCENE-4702: CHANGES entry. 2020-01-27 18:27:53 +01:00
Robert Muir 975df9ddd3
LUCENE-9182: add apache license headers to all .gradle files and enforce in rat task 2020-01-27 12:05:34 -05:00
Robert Muir 8e357b167b
LUCENE-9180: dos2unix files that don't need dos line endings 2020-01-27 11:29:59 -05:00
Robert Muir fddb5314fc
LUCENE-9172: nuke some compiler warnings 2020-01-27 06:08:30 -05:00
Alan Woodward 02f862670e
LUCENE-9153: Allow WhitespaceAnalyzer to set a custom maxTokenLen (#1198)
WhitespaceTokenizer defaults to a maximum token length of 255, and WhitespaceAnalyzer
does not allow this to be changed. This commit adds an optional maxTokenLen parameter
to WhitespaceAnalyzer as well, and documents the existing token length restriction.
2020-01-27 09:22:25 +00:00
Ignacio Vera 1fe4177ac0
LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount (#1212) 2020-01-27 09:52:25 +01:00
Uwe Schindler 0635756f76 Fix Windows Line endings in the source-patterns checker (silly bug: it's \r\n on windows not the other way round) 2020-01-26 11:48:24 +01:00
Robert Muir c53cc3edaf
LUCENE-9167: test speedup for slowest/pathological tests (round 3) 2020-01-24 08:58:59 -05:00
Adrien Grand b283b8df62
LUCENE-4702: Terms dictionary compression. (#1126)
Compress blocks of suffixes in order to make the terms dictionary more
space-efficient. Two compression algorithms are used depending on which one is
more space-efficient:
 - LowercaseAsciiCompression, which applies when all bytes are in the
   `[0x1F,0x3F)` or `[0x5F,0x7F)` ranges, which notably include all digits,
   lowercase ASCII characters, '.', '-' and '_', and encodes 4 chars on 3 bytes.
   It is very often applicable on analyzed content and decompresses very quickly
   thanks to auto-vectorization support in the JVM.
 - LZ4, when the compression ratio is less than 0.75.

I was a bit unhappy with the complexity of the high-compression LZ4 option, so
I simplified it in order to only keep the logic that detects duplicate strings.
The logic about what to do in case overlapping matches are found, which was
responsible for most of the complexity while only yielding tiny benefits, has
been removed.
2020-01-24 14:46:57 +01:00
Robert Muir a29a4f4aa5
LUCENE-9168: don't let crazy tests run us out of open files with these params 2020-01-24 08:46:50 -05:00
Cassandra Targett 64cb1c8fe8
SOLR-12930: Create developer docs directories in source repo (#1164) 2020-01-23 14:00:23 -06:00
Robert Muir f440fbdf59
LUCENE-9083: throw assumption if you try to remap /dev to /dev with this test mock 2020-01-22 21:58:52 -05:00
Robert Muir 1051db4038
LUCENE-9163: test speedup for slowest/pathological tests
Calming down individual test methods with double-digit execution times
after running tests many times.

There are a few more issues remaining, but this solves the majority of them.
2020-01-22 17:49:33 -05:00
Robert Muir 8fd3fbd93c
TestPointValues only index 300k docs in NIGHTLY configuration, that is too much locally 2020-01-22 10:27:15 -05:00
Robert Muir b7694535eb
mark StressRamUsageEstimator tests nightly.
This is consistently the slowest test for me in all of lucene core by
far. Takes around an entire minute. Mark it nightly: should catch any
issues with RAM estimation but keep local builds fast.
2020-01-22 10:19:44 -05:00
Robert Muir 9dae566ee7
LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.
Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.
2020-01-22 09:58:30 -05:00
Robert Muir 3ecd7a03aa
LUCENE-9159: merge gradle/ant test security policies (main file) 2020-01-21 23:43:31 -05:00
Robert Muir 7e0534d87c
LUCENE-9159: merge gradle/ant test security policies 2020-01-21 21:26:37 -05:00
Robert Muir c754a764d4
LUCENE-9157: test speedup for slowest tests 2020-01-21 19:27:19 -05:00
Mike ec6a9aab09
LUCENE-9098 Use multibyte code-points for complex fuzzy query (#1194) 2020-01-21 12:16:42 -06:00
Bruno Roustant 8894babd4a
LUCENE-9135: Make UniformSplit FieldMetadata counters long.
Closes #1168
2020-01-21 11:24:26 +01:00
Adrien Grand bddb06b650 CompetitiveImpactAccumulator should protect its costly invariant checks behind an `assert`. 2020-01-20 11:16:09 +01:00
Nicholas Knize aad849bf87 LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes from sandbox to core 2020-01-17 14:34:40 -06:00
Mike 338d386ae0
LUCENE-9145 First pass addressing static analysis (#1181)
Fixed a bunch of the smaller warnings found by error-prone compiler
plugin, while ignoring a lot of the bigger ones.
2020-01-17 13:30:39 -06:00
Mike McCandless 8147e491ce LUCENE-9053: improve FST's package-info.java comment to clarify required (Unicode code point) sort order for FST.Builder 2020-01-17 13:35:05 -05:00
Adrien Grand fb3ca8d000
LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. (#1149) (#1158)
All the metadata can be directly encoded in the `DataOutput`.
2020-01-17 13:39:45 +01:00
Christine Poerschke f04a5177e6 Update copyright year(s) in lucene/NOTICE.txt and solr/NOTICE.txt files. 2020-01-16 18:13:47 +00:00
Dawid Weiss 1e4565ce26 Don't delete jetty-start when regenerating sha checksums from ant. 2020-01-16 18:58:55 +01:00
Nicholas Knize 78655239c5 LUCENE-8369: Remove obsolete spatial module 2020-01-16 11:22:05 -06:00
Ignacio Vera eb13d5bc8b
LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. (#1178) 2020-01-16 16:11:46 +01:00
Alan Woodward 9d72bfc1af
LUCENE-9068: Build FuzzyQuery automata up-front (#1042)
FuzzyTermsEnum can now either take an array of compiled automata, and
an AttributeSource, to be used across multiple segments (eg during
FuzzyQuery rewrite); or it can take a term, edit distance, prefix and transition
boolean and build the automata itself if only being used once (eg for fuzzy
nearest neighbour calculations).

Rather than interact via attribute sources and specialized attributes, users of
FuzzyTermsEnum can get the boost and set minimum competitive boosts
directly on the enum.
2020-01-15 14:58:11 +00:00
Dawid Weiss c51a4a030b Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-15 13:00:02 +01:00
Ignacio Vera ff365a0abf
LUCENE-8903: Add LatLonShape point query (#762) 2020-01-15 11:57:53 +01:00
Dawid Weiss 08d2c2d0df Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-15 09:54:45 +01:00
Dawid Weiss 567706041d LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to estimate Long.valueOf cache size: there will be some resulting overestimation but it shouldn't be harmful. 2020-01-15 09:50:51 +01:00
Dawid Weiss fb5ba8c9de LUCENE-9117: follow-up. 2020-01-14 16:12:26 +01:00
Dawid Weiss 742301ca15 LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to estimate Long.valueOf cache size. 2020-01-14 16:05:39 +01:00
Ishan Chattopadhyaya 3e3a0f9bc2 Add back-compat indices for 8.4.1 2020-01-14 19:41:43 +05:30
Ishan Chattopadhyaya e3f3f3bbef Add bugfix version 8.4.1 2020-01-14 18:54:32 +05:30
Dawid Weiss 3008dd9526 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-13 17:55:53 +01:00
Dawid Weiss 7dc4df9524 LUCENE-9126: enable javadoc linting bypassing java bug. Corrected syntax errors so that validations passes but had to disable ALL html checks (tons of them). 2020-01-13 17:50:57 +01:00
0xflotus 5a73ad0178 Two minor Javadoc cleanups (#1002) 2020-01-13 09:22:04 -05:00
Bruno Roustant 0528621d2f
LUCENE-9125: Optimize Automaton.step() with binary search and introduce Automaton.next().
Closes #1160
2020-01-13 10:27:31 +01:00
Dawid Weiss f9dde4de52 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-13 08:37:15 +01:00
Erick Erickson 3bae63d215 LUCENE-9080: Upgrade ICU4j to 62.2 and make regenerate work 2020-01-12 17:12:57 -05:00
Dawid Weiss d7726495c5 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-09 19:22:09 +01:00
Dawid Weiss 4599c51f0d LUCENE-9122: add support for running tests against alternate jvms. 2020-01-09 19:00:32 +01:00
Adrien Grand 239d9a6726 Revert "LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. (#1149)"
This reverts commit d0b4a166e0.
2020-01-09 17:37:54 +01:00
Adrien Grand d0b4a166e0
LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. (#1149)
LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`.

All the metadata can be directly encoded in the `DataOutput`.
2020-01-09 15:16:26 +01:00
Adrien Grand 7ad33c3a98
LUCENE-9115: NRTCachingDirectory shouldn't cache files of unknown size. (#1145) 2020-01-09 15:15:30 +01:00
Adrien Grand b11c3cffe4
LUCENE-9118: BlockTreeTermsReader uses `Arrays#compareUnsigned` to compare suffixes. (#1150) 2020-01-09 15:09:21 +01:00
Dawid Weiss 0674fada65 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-09 11:56:02 +01:00
Mike McCandless deba7d1404 LUCENE-9084: fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester 2020-01-08 19:28:36 -05:00
Dawid Weiss 405d227c55 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-07 08:45:12 +01:00
Adrien Grand b6f31835ad LUCENE-8673: Avoid OOMEs because of IOContext randomization. 2020-01-06 14:43:41 +01:00
Adrien Grand 6bb1f6cbbe LUCENE-9096: CHANGES entry. 2020-01-06 09:20:29 +01:00
kkewwei 2db4c909ca LUCENE-9096:Simplify CompressingTermVectorsWriter#flushOffsets. (#1125) 2020-01-06 09:19:08 +01:00
Armin Braun f87b4c13bb Fix Incorrect Constant Name in Codec Docs (#1047)
The name was wrong here. Also, added a link to make this doc more
fun to navigate in HTML form and make sure it doesn't go bad again.
2020-01-06 09:03:40 +01:00
Adrien Grand dd74869347
BlockTreeTermsWriter should compute prefix lengths using Arrays#mismatch. (#1074) 2020-01-06 09:02:51 +01:00
Adrien Grand dcc01fdaa6
LUCENE-9113: Speed up merging doc values' terms dictionaries. (#1136) 2020-01-06 09:01:42 +01:00
Dawid Weiss f5f1f8fad7 Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-03 14:13:21 +01:00
Dawid Weiss 2150f9ccc3 Don't invoke RamUsageTester.sizeOf(buf) over and over on nightly tests. 2020-01-03 13:46:58 +01:00
Dawid Weiss 1a690d95ad Merge remote-tracking branch 'origin/master' into gradle-master 2020-01-02 10:49:19 +01:00
Dawid Weiss b1bb7bf8c2 Move newDirectory() creation to before, otherwise if something happens prior to before/after rule being invoked, the directory wouldn't be closed/ cleaned up properly. 2020-01-02 10:48:59 +01:00
Dawid Weiss 128fd9a4ff Move newDirectory() creation to before, otherwise if something happens prior to before/after rule being invoked, the directory wouldn't be closed/ cleaned up properly. 2020-01-02 10:48:10 +01:00
Nándor Mátravölgyi 4c9cc2cefd LUCENE-9093: UnifiedHighlighter LengthGoalBreakIterator frag align
Matches in passages should be centered better on average.
 Closes #1123
2020-01-01 00:57:00 -05:00
Bruno Roustant 1851779ddb
LUCENE-9106: UniformSplit postings format allows extension of block/line serializers.
Closes #1106
2019-12-31 10:14:55 +01:00
Dawid Weiss a40b3e755b Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-31 10:08:07 +01:00
Dawid Weiss 5bb5f7eddf Upgrade Randomizedtesting to 2.7.6 2019-12-31 09:42:44 +01:00
Bruno Roustant bbb6e418e4
LUCENE-9105: UniformSplit postings format detects corrupted index and better handles IO exceptions.
Closes #1105
2019-12-30 12:23:50 +01:00
Dawid Weiss d79b678b39 Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-30 09:24:46 +01:00
Adrien Grand ca6bd364fb Add back-compat indices for 8.4.0 2019-12-29 21:43:09 +01:00
Dawid Weiss 11a946d145 Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-27 15:17:24 +01:00
Uwe Schindler e06ad4cfb5
LUCENE-9110: Refactor stack analysis in tests to use generalized LuceneTestCase methods that use StackWalker 2019-12-27 11:54:00 +01:00
Dawid Weiss 23f3fd2d48 Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-25 13:14:57 +01:00
Uwe Schindler 65611f6d66
LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of JVM exit (#1114)
Use stack walker (Java 11 on master only) to implement TestSecurityManager's detection of test JVM exit
2019-12-25 12:45:05 +01:00
Robert Muir 126d6b7767
SOLR-13984: add (experimental, disabled by default) security manager support (#1082)
* SOLR-13984: add (experimental, disabled by default) security manager support.

User can set SOLR_SECURITY_MANAGER_ENABLED=true to enable security manager at runtime.

The current policy file used by tests is moved to solr/server
Additional permissions are granted for the filesystem locations set by bin/solr, and networking everywhere is enabled.

This takes advantage of the fact that permission entries are ignored if properties are not defined:
https://docs.oracle.com/javase/7/docs/technotes/guides/security/PolicyFiles.html#PropertyExp
2019-12-24 06:30:31 -08:00
Nándor Mátravölgyi 1be5b68964 LUCENE-9091: UnifiedHighlighter HTML escaping should only
escape essentials
2019-12-23 17:20:48 -05:00
Bruno Roustant 663bfe2d8b
LUCENE-9102: update changes.txt 2019-12-23 16:54:07 +01:00
Andy Webb 45dce34316
LUCENE-9102: Add maxQueryLength option to DirectSpellchecker.
Closes #1103
2019-12-23 11:41:56 +01:00
noble db2b21a169 rvert unnecessary commits 2019-12-23 15:24:37 +11:00
Noble Paul ef15ae9805
SOLR-14125 : Streaming expressions to be loadable from packages (#1108)
SOLR-14125: Make <expressible> plugins work with packages
2019-12-23 15:20:26 +11:00
Michael Sokolov 93309e9728 LUCENE-8596: Treat hash mark as comment only at beginning of line in kuromoji
user dictionary. Via Masaru Hasegawa and Satoshi Kato
2019-12-21 14:09:40 -05:00
Dawid Weiss 5897b78572 Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-20 17:35:40 +01:00
Mike Drob 1333bd10a7 SOLR-13190 Fix for failing test 2019-12-19 22:49:26 -06:00
Kevin Risden aab3c5faa3
SOLR-14106: Cleanup Jetty SslContextFactory usage
Jetty 9.4.16.v20190411 and up introduced separate
client and server SslContextFactory implementations.
This split requires the proper use of of
SslContextFactory in clients and server configs.

This fixes the following
* SSL with SOLR_SSL_NEED_CLIENT_AUTH not working since v8.2.0
* Http2SolrClient SSL not working in branch_8x

Signed-off-by: Kevin Risden <krisden@apache.org>
2019-12-19 23:05:47 -05:00
Mike Drob a4c884a22f LUCENE-9098 Report bad term for fuzzy query
When a fuzzy query encounters a term that is too complex, the exception
should report the term instead of a cryptic message about too many
states.
2019-12-19 10:58:28 -06:00
Adrien Grand 907d1142fa LUCENE-9103: WANDScorer can miss some hits in some rare conditions. 2019-12-19 17:14:55 +01:00
iverase b1c6d7c0c3 Move changes entry to 8.4 2019-12-19 08:47:07 +01:00
Dawid Weiss 2e453afa28 Merge tika upgrade monster-patch. commons-csv excluded from gradle. 2019-12-18 09:51:57 +01:00
Dawid Weiss 28b19c2af2 Merge with master. 2019-12-18 09:32:35 +01:00
Dawid Weiss 71a5714e29 SOLR-14103: remove extra unused dependencies (jersey-core, jersey-server, netty-all). 2019-12-18 09:18:32 +01:00
Tim Allison 279a391cf3
SOLR-14054 -- upgrade to Tika 1.23 (and its dependencies) (#1092)
* SOLR-14054 -- upgrade to Tika 1.23 (and its dependencies)

* fix CHANGES.txt file
2019-12-17 16:09:08 -05:00
Dawid Weiss 845b20224d SOLR-14103: follow up 2019-12-17 16:12:40 +01:00
Dawid Weiss 4c94a13e69 Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-17 13:38:14 +01:00
Ignacio Vera 17ef175224
LUCENE-9055: Fix the detection of lines crossing triangles through edge points (#1020) 2019-12-17 09:38:58 +01:00
Robert Muir dc35e5752b LUCENE-9094: Ban ObjectInputStream and ObjectOutputStream in forbidden-apis 2019-12-16 13:31:11 -05:00
Dawid Weiss bc539fc0fd Merge remote-tracking branch 'origin/master' into gradle-master 2019-12-16 11:20:45 +01:00
Kevin Risden 7a9a6ef79e
SOLR-14077: Hadoop shouldn't need to look for metrics config in user home
Signed-off-by: Kevin Risden <krisden@apache.org>
2019-12-13 22:08:48 -05:00
erick 8278886966 SOLR-14026: Upgrade Jetty to 9.4.24.v20191120 and dropwizard to 4.1.2 2019-12-13 10:01:37 -05:00