Commit Graph

11342 Commits

Author SHA1 Message Date
Andrzej Bialecki 7989a863fa LUCENE-8855: Fix some size estimates and relax test assertions to work under different JVMs. 2019-06-28 10:33:27 +02:00
Sven Amann 7c3d6c7214 LUCENE-8890: Improve parallel iteration of two lists of same length. (#446)
The class `BooleanWeight` takes a `BooleanQuery` (a list of `BooleanClause`s) as input and maintains a list of weights corresponding to the clauses. The clauses and the weights are iterated in parallel in various places throughout the class. At these code locations, it is not obvious that these two lists always have the same length, i.e., that the parallel iteration is safe. Moreover, the parallel iteration is not well supported by the Java language, which is why this operation is implemented differently throughout the code.

This patch joins the two lists to enable parallel iteration without managing two separate lists. This makes the code’s intent more obvious and prevents bugs due to the lists getting out of sync by a future change.
2019-06-28 09:50:37 +02:00
Atri Sharma 7cd20384de LUCENE-8889: Add Tests For Accessors Of Ranges in PointRangeQuery (#748) 2019-06-27 13:55:15 +02:00
Adrien Grand 23b6a3cd3a LUCENE-8871: Fix precommit failures. 2019-06-27 12:03:25 +02:00
iverase 754ce1f437 LUCENE-8886: Fix TestMutablePointsReaderUtils tests 2019-06-27 11:35:54 +02:00
Adrien Grand 7032176705 LUCENE-8815: Remove leftover println. 2019-06-27 08:09:26 +02:00
Adrien Grand 82234ef2f4 LUCENE-8855: Remove unused import. 2019-06-27 08:08:51 +02:00
Adrien Grand b7029b35d5 LUCENE-8815: Use a LogMergePolicy when the order of documents is important. 2019-06-27 08:08:51 +02:00
Michael Sokolov 024e200bb9 LUCENE-8871: promote kuromoji tools to main jar 2019-06-26 22:34:00 -04:00
Andrzej Bialecki a76c962ee6 LUCENE-8855: Add Accountable to some Query implementations. 2019-06-26 15:26:54 +02:00
Alan Woodward 6751c072ab LUCENE-8811: Remove deprecated BooleanQuery maxCount methods 2019-06-26 10:55:55 +01:00
Alan Woodward 53f56fb7ad LUCENE-8811: Move max clause checks to IndexSearcher 2019-06-26 10:55:55 +01:00
jimczi 889f73105f LUCENE-8859: The completion suggester's postings format now have an option to load its internal FST off-heap. 2019-06-26 11:16:51 +02:00
Ignacio Vera dac4310129
LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality (#730)
When a leaf has only few distinct values, we store the distinct values with the cardinality.
2019-06-26 10:16:12 +02:00
Ignacio Vera 36eaf75b1f
LUCENE-8879: Improve BKDRadixSelector tests
This change adds explicit test for the sorting capabilities.
2019-06-26 09:45:44 +02:00
Julie Tibshirani 5bf023cf19 LUCENE-7714: Add a range query in sandbox that takes advantage of index sorting. 2019-06-26 09:17:48 +02:00
jimczi b85840b97f LUCENE-8848: Fix IndexWriter leak when TestUnifiedHighlighter#testNotReanalyzed is ignored 2019-06-25 10:36:25 +02:00
David Smiley 85ec39d931 SOLR-13367: Range queries will now highlight in hl.method=unified mode.
Lucene MatchesUtils.disjunction method for disjunction over
 BytesRefIterator terms.
2019-06-25 00:10:08 -04:00
Alan Woodward c33177e428 LUCENE-8766: Further checks against race in test 2019-06-24 10:04:12 +01:00
Ignacio Vera d9dbb70d01
LUCENE-8838: Remove support for Steiner points (#703)
This is currently not used/supported.
2019-06-24 09:41:33 +02:00
Tomoko Uchida 559abd8f28 LUCENE-8778: Update the changelog because this was backported to 8.x branch. 2019-06-22 20:48:41 +09:00
Tomoko Uchida 2d4dea370a LUCENE-8778: Add SPI name and documentation for the KoreanNumberFilterFactory 2019-06-22 20:23:01 +09:00
Tomoko Uchida 422cf14439 Resolve conflicts in CHANGES. 2019-06-22 16:41:27 +09:00
Tomoko Uchida 8e81f47ca6 LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps.
Co-authored-by: Jun Ohtani
Co-authored-by: Tomoko Uchida
2019-06-22 16:22:26 +09:00
Tomoko Uchida 98c85a0e1a LUCENE-8778: Define analyzer SPI names as static final fields and document the names in all analysis components. This also changes SPI loader to detect service names via the static NAME fields instead of class names. 2019-06-22 10:46:37 +09:00
David Smiley 54cc70127b LUCENE-8848 LUCENE-7757 LUCENE-8492: UnifiedHighlighter.hasUnrecognizedQuery
The UH now detects that parts of the query are not understood by it.
When found, it highlights more safely/reliably.
Fixes compatibility with complex and surround query parsers.
2019-06-21 17:05:56 -04:00
Simon Willnauer b3e759a658
Expose IndexSearchers executor in order to enable searcher cloning (#732)
Today if an executor was added to the IndexSearcher it's impossible to
clone the searcher with it's cache, similarty and caching policy since
the executor is not exposed. This adds a simple getter to make cloning
easier.
2019-06-21 10:28:47 +02:00
Robert Muir 91331d1a89 LUCENE-8866: remove kuromoji/tools dependency on ICU 2019-06-20 21:20:17 -04:00
Michael Sokolov aa29bea071 Add missing javadocs for new BinaryDictionary.ResourceScheme 2019-06-21 01:01:10 +02:00
Michael Sokolov 4502065f03 LUCENE-8863: enhance Kuromoji DictionaryBuilder tool
added tests
 enabled ids up to 8191
 support loading custom system dictionary from filesystem or classpath
2019-06-21 00:38:44 +02:00
Alan Woodward 371f50acc2 LUCENE-8766: Fix timing problem in test 2019-06-20 17:32:23 +01:00
Jan Høydahl 87c131baa7
LUCENE-8852 ReleaseWizard tool (#710) 2019-06-20 14:45:17 +02:00
Simon Willnauer c6899fc40d
LUCENE-8865: Move to executor in IndexSearcher (#731)
In order to simplify testing this change moves to use the Executor
interface instead of ExecutorService. This change also simplifies
customizing execute methods for use-cases that need to add additional
logic for forking to new threads. This change also adds a test for
the optimization added in LUCENE-8865.

This change is fully backwards compatible since ExecutorService implements
Executor.
2019-06-20 14:26:40 +02:00
Adrien Grand 7c5247c60c LUCENE-8847: Fix typo in CHANGES. 2019-06-19 09:51:29 +02:00
Michael Sokolov 2e49f13aa1 LUCENE-8781: add FST array-with-gap addressing to Util.readCeilArc 2019-06-18 21:28:16 +02:00
Adrien Grand 2e468abecc LUCENE-8853: Don't return a FileSwitchDirectory when asked for a FS directory. 2019-06-18 17:15:21 +02:00
Simon Willnauer 60f3b25d06
LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor (#725)
Today we don't utilize the incoming thread for a search when IndexSearcher
has an executor. This thread is only idling but can be used to execute a search
once all other collectors are dispatched.
2019-06-18 14:56:51 +02:00
Luca Cavanna 4fd09eb3e3 LUCENE-8796: Use exponential search in IntArrayDocIdSetIterator#advance (#667) 2019-06-18 10:29:51 +02:00
Simon Willnauer fb6e28d9f1
LUCENE-8853: Try parsing original file extension from tmp file (#716)
FileSwitchDirectory fails if the tmp file are not in the same directory
as the file it's renamed to. This is correct behavior but breaks with
tmp files used with index sorting. This change tries best effort to find
the right extension directory if the file ends with `.tmp`
2019-06-18 08:47:59 +02:00
Cao Manh Dat 0c24aa6c75 SOLR-13541: Upgrade Jetty to 9.4.19.v20190610 2019-06-14 15:46:19 +01:00
Charlie Yan af2a4fe464 Update package-info.java (#388)
add a missing parenthesis
2019-06-14 14:54:57 +02:00
Jan Høydahl d2793688ca
LUCENE-8861: Script to find open PRs that needs attention (#719) 2019-06-14 13:30:04 +02:00
Alan Woodward b8c299640d LUCENE-8766: Pass BytesRef offset/length when decoding from input stream 2019-06-13 16:40:03 +01:00
Alan Woodward b588e0b19e LUCENE-8766: Add CHANGES entry 2019-06-13 10:18:12 +01:00
Alan Woodward 251dbe7cea LUCENE-8766: Add monitor subproject 2019-06-13 09:40:57 +01:00
Simon Willnauer 608d9134ad LUCENE-8835: Irony - our tests don't emulate windows well enough 2019-06-12 17:56:06 +02:00
Simon Willnauer e6a9bfb8b2 LUCENE-8853: Temporarily disable random FileSwitchDirectory 2019-06-11 21:32:45 +02:00
Simon Willnauer b6c68ccded
LUCENE-8835: Respect file extension when listing files form FileSwitchDirectory (#700)
FileSwitchDirectory splits file actions between 2 directories based
on file extensions. The extensions are respected on write operations
like delete or create but ignored when we list the content of the
directories. Until now we only deduplicated the contents on
Directory#listAll which can cause inconsistencies and hard to debug
errors due to double deletions in IndexWriter is a file is pending
delete in one of the directories but still shows up in the directory
listing form the other directory. This case can happen if both
directories point to the same underlying FS directory which is a
common use-case to split between mmap and NIOFS.

This change filters out files from directories depending on their
file extension to make sure files that are deleted in one directory
are not returned form another if they point to the same FS directory.
2019-06-11 17:27:55 +02:00
Alan Woodward 7a2b965106 LUCENE-8845: Add additional max boolean clause cap on expansion 2019-06-11 12:11:29 +01:00
Alan Woodward 142a20bb0b LUCENE-8843: Fix precommit 2019-06-11 10:19:37 +01:00
Alan Woodward 50d65889df LUCENE-8815: Ensure single segments in tests 2019-06-11 10:19:37 +01:00
Adrien Grand fb0f1776a5 LUCENE-8843: Add CHANGES entry. 2019-06-11 10:22:05 +02:00
Jason Tedor 4fdcb14acf LUCENE-8843: Only ignore IOException on dirs when invoking force (#706)
Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a
directory. However, the catch block here is too broad, for example it
would be ignoring IOExceptions when we try to open a non-existent
file. This commit addresses that by scoping the ignored exceptions only
to the invocation of FileChannel#force. This prevents us from
suppressing an exception in case we run into an unexpected issue when
opening the file.

However, fsyncing directories on Windows is not possible. We always
suppressed this by allowing that an AccessDeniedException is thrown when
attemping to open the directory for reading. Yet, per the above, this
suppression also allowed other IOExceptions to be suppressed, and that
should be considered a bug (e.g., not only the directory not existing,
but any filesystem error and other reasons that we might get an access
denied there, like genuine permissions issues). Rather than relying on
exceptions for flow control and continuing to suppress there, we simply
return early if attempting to fsync a directory on Windows (we should
not put this burden on the caller).
2019-06-11 10:19:14 +02:00
Ignacio Vera 88c5817c01
LUCENE-8775: Compute properly the bridge between a polygon and a hole when sharing a vertex. 2019-06-11 07:01:42 +02:00
Koen De Groote 67104dd615 LUCENE-8847: Code Cleanup: Rewrite StringBuilder.append with concatted strings (#707)
This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation.
This defeats the purpose of using StringBuilder and also introduces an extra alloction.
These changes should avoid that.

ant tests have run, succeeded on local machine.

Removing test files from the changes.

Another suggested rework.
2019-06-10 18:07:43 +02:00
Alan Woodward e8950f4a52 LUCENE-8845: Allow configurable maxExpansions for prefix/wildcard intervals 2019-06-10 16:14:42 +01:00
Atri Sharma f84afab008 LUCENE-8362: Introduce DocValues Fields and Range Queries for native Range Field Types
This commit introduces a new DocValues field and corresponding
range query for binary ranges. These classes are extended into
concrete implementations for each of Int, Long, Float and Double
range fields.
2019-06-10 15:14:15 +02:00
Colin Goodheart-Smithe 5ef2b3f6b8 LUCENE-8815: Adds a DoubleValues implementation for feature fields (#687)
This change adds a static method FeatureField#newDoubleValues() which can be used to retrieved the values of a feature for documents directly rathert than having to store the values in a numeric field alongsidde the feature field.
2019-06-10 09:07:24 +02:00
Tim Underwood 97ca9df7ef LUCENE-8834: Cache the SortedNumericDocValues.docValueCount() value whenever it is used in a loop (#698) 2019-06-10 08:56:21 +02:00
Namgyu Kim fe58b6f3a2 LUCENE-8812: disable Java 9 try-with-resources style in TestKoreanNumberFilter
Signed-off-by: Namgyu Kim <namgyu@apache.org>
2019-06-10 01:56:34 +09:00
Namgyu Kim 5a75b8a080 LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number and process decimal point
Signed-off-by: Namgyu Kim <namgyu@apache.org>
2019-06-09 23:00:14 +09:00
Michael Sokolov e85c6e6429 LUCENE-8844: bump FST version and fix related CHANGES entry 2019-06-08 10:22:02 -04:00
Atri Sharma 965fd194d1 LUCENE-8825: Improve CheckHits's Printing Capabilities
Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-06-07 18:47:41 +02:00
Alan Woodward 67677d995e LUCENE-8828: Make unorderedNoOverlaps a separate IntervalsSource 2019-06-07 14:45:56 +01:00
Jan Høydahl 8d6fd7298f LUCENE-8818: Fix smokeTestRelease.py encoding bug 2019-06-06 21:42:24 +02:00
Ignacio Vera 05ea0f2d54
LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex with the polygon 2019-06-06 08:58:49 +02:00
Ignacio Vera c6390f80d1
LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode method 2019-06-05 15:57:10 +02:00
Jan Høydahl 73b15d8984 Add back-compat indices for 7.7.2 2019-06-05 11:16:41 +02:00
Jan Høydahl be18d8eaa2 Add bugfix version 7.7.2 2019-06-05 02:31:09 +02:00
Cao Manh Dat 301ea0e462
SOLR-13434: OpenTracing support for Solr (#685) 2019-06-04 20:04:11 +01:00
Erick Erickson 7ebeab71f4 SOLR-8346: Upgrade Zookeeper to version 3.5.5 2019-06-03 17:50:35 -07:00
Simon Willnauer d488156921 Merge branch 'master' into LUCENE-8813 2019-05-31 21:05:41 +02:00
Simon Willnauer 086088e699 more feedback 2019-05-29 20:09:17 +02:00
Andrzej Bialecki 2020eb43de Add backcompat indexes for 8.1.1. 2019-05-29 18:22:22 +02:00
Simon Willnauer fceee244dd apply feedback 2019-05-29 09:59:39 +02:00
Simon Willnauer 165d2d5ff5 LUCENE-8813: Ensure we never apply deletes from a closed DWPTDeleteQueue
Today we don't have a strong protection that we add and apply deletes / updates
on or from an already flushed delete queue. DWPTDeleteQueue instances are replaced
once we do a full flush in order to reopen an NRT reader or commit the IndexWriter.

In LUCENE-8813 we tripped an assert that used to protect us from such an situation
but it didn't take all cornercases from concurrent flushing into account. This change
adds a stronger protection and ensures that we neither apply a closed delete queue nor
add any updates or deletes to it.

This change also allows to speculativly freeze the global buffer that might return
null now if the queue has already been closed. This is now possible since we ensure that
we never see modifications to the queue after it's been closed and that happens right after
the last DWPT for the ongoing full flush is done flushing.
2019-05-28 16:44:34 +02:00
jimczi db334c792b LUCENE-8784: Restore the Korean's part of speech tag for NGRAM.
The part of speech tag for unigram has been changed inadvertenly in a previous commit (not released).
This change restores the original value that is also set on the serialized unkwnown dictionary.
2019-05-28 12:01:05 +02:00
Simon Willnauer 171d7f131f LUCENE-8813: Count down latch in finally block.
This test hangs until it times-out when an assertion is tripped
in the indexing thread. Counting down the latch in a finally block
will cause the test to fail earlier.
2019-05-28 10:55:18 +02:00
Adrien Grand c252b92caa LUCENE-8135: Fix number of clauses randomization. 2019-05-28 09:53:58 +02:00
Namgyu Kim a556925eb8 LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set to false (defaults to true).
Signed-off-by: Namgyu Kim <kng0828@gmail.com>
Signed-off-by: jimczi <jimczi@apache.org>
2019-05-27 15:15:24 +02:00
Colin Goodheart-Smithe 46060d88a2 LUCENE-8803: Provide a FieldComparator to allow sorting by a feature from a FeatureField (#680)
This change adds a SortField which allows a convenient way to sort search hits using a feature from a FeatureField.
2019-05-24 08:45:57 +02:00
Nhat Nguyen 0435348b29 LUCENE-8809: Ensure release segment states
If refresh and rollback happen concurrently, then we can leave segment
states unreleased leads to leaking refCount of some SegmentReaders.
2019-05-23 11:25:28 -04:00
Adrien Grand 97046c7054 LUCENE-8757: Fix test bug. 2019-05-22 09:10:52 +02:00
Atri Sharma 87e936f1bb LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm
The current slicing algorithm assigns a thread per segment, which
can be detrimental to performance in case the distribution has
a large number of small segments. The patch introduces a slicing
algorithm which coalesces smaller segments to a single thread,
thus reducing the impact of context switching by limiting the
number of threads

Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-05-21 20:18:42 +02:00
Namgyu Kim 5a694ea26f LUCENE-8805: Parameter changes for stringField() in StoredFieldVisitor
Signed-off-by: Namgyu Kim <kng0828@gmail.com>
Signed-off-by: Adrien Grand <jpountz@gmail.com>
2019-05-21 20:18:42 +02:00
Uwe Schindler c756b50ae4 LUCENE-8807: Change all download URLs in build files to HTTPS 2019-05-21 17:06:00 +02:00
jimczi 4640a527a4 LUCENE:8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid executing the second phase when scorers don't intersect 2019-05-21 11:35:44 +02:00
Adrien Grand ec6ac9756c LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances. 2019-05-20 20:23:08 +02:00
Andrzej Bialecki ed4b789bf4 Add new version number for 8.1.1 release. Move the SOLR-13475 entry to the correct section. 2019-05-20 20:14:21 +02:00
Noble Paul bd64ed6d2a
SOLR-13437: fork noggit code into Solr (#666)
* SOLR-13437: fork noggit code into Solr
2019-05-16 11:10:27 +10:00
Ishan Chattopadhyaya 9189472d70 Adding backcompat indexes for 8.1 2019-05-13 16:30:57 +05:30
Atri Sharma c988b04b18 LUCENE-7840: Avoid Building Scorer Supplier For Redundant SHOULD Clauses
For boolean queries, we should eliminate redundant SHOULD clauses during
query rewrite and not build the scorer supplier, as opposed to
eliminating them during weight construction

Signed-off-by: jimczi <jimczi@apache.org>
2019-05-09 09:00:18 +02:00
Simon Willnauer a759a5d47c Fix Changes.txt entry 2019-05-08 11:43:49 +02:00
Simon Willnauer e8d88a5b54
LUCENE-8785: Ensure threadstates are locked before iterating (#664)
Ensure new threadstates are locked before retrieving the
number of active threadstates. This causes assertion errors
and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing.
2019-05-08 11:19:21 +02:00
Dawid Weiss 5c9e7d5351 LUCENE-8781: FST lookup performance has been improved in many cases by encoding Arcs using full-sized arrays with gaps. The new encoding is enabled for postings in the default codec and for suggesters. (Mike Sokolov) 2019-05-06 11:19:35 +02:00
Christine Poerschke 6842676952 LUCENE-8756: ant precommit (ant check-forbidden-apis) fix 2019-05-01 19:31:42 +01:00
Ishan Chattopadhyaya c808b2f5fe Adding 8.2 version 2019-05-01 15:15:49 +05:30
Thomas Lemmé 424558ff88 LUCENE-8787: DateRangePrefixTree now parses milliseconds when num digits != 3 2019-05-01 00:32:19 -04:00
Uwe Schindler 87c16882ae LUCENE-8738, LUCENE-8786: Fix ECJ linter to accept Java 11 syntax 2019-04-30 19:40:00 +02:00
Mike McCandless 4a76ad7263 LUCENE-8756: add CHANGES entry 2019-04-30 12:15:49 -04:00