Commit Graph

35119 Commits

Author SHA1 Message Date
Tomoko Uchida 2160d7239d Revert "LUCENE-9448: clean up unused start scripts for luke."
This reverts commit 16104090fb.
2021-05-27 19:22:29 +09:00
Alan Woodward 1e7d8146ff
LUCENE-9454: Remove version field on Analyzer (#154)
Version switching on Analyzer behaviour should be implemented
in the various component factories, rather than on a mutable
setting on Analyzer itself.
2021-05-26 17:34:01 +01:00
Tomoko Uchida 16104090fb LUCENE-9448: clean up unused start scripts for luke. 2021-05-26 23:32:52 +09:00
Alan Woodward 4464cd87cc
LUCENE 9204: Move SpanQuery and subclasses to the queries module (#152) 2021-05-26 10:12:14 +01:00
Dawid Weiss 5912e65434
LUCENE-9974: The test-framework module should apply the test ruleset for forbidden APIs. (#153) 2021-05-26 10:19:55 +02:00
Alan Woodward 93844d3846
LUCENE-9204: Move helper methods from TestMatchesIterator into a base class (#151)
TestMatchesIterator lives in core/tests and does various sanity checks
on the matches returned by various queries, including Span queries.
The Span-specific tests cannot stay here once Spans have been moved
out of core. This commit pulls various helper methods from this class
into a base class in the test framework, so that we can move the
Spans tests into their own class and keep coverage once things have
been migrated.
2021-05-25 14:16:05 +01:00
Alan Woodward 4b55ae5de4
LUCENE-9204: Remove Spans references from DisiWrapper (#150)
We have a number of helper classes in o.a.l.search that aid the
implementation of two-phase iteration over disjunctions. These have
some Spans-specific code, which will stop compiling once Spans
are moved into the queries module. This commit removes the
Spans references from the main code and duplicates the helper
code within the Spans package.
2021-05-25 14:14:47 +01:00
Alan Woodward 5e0e7a5479
LUCENE-9204: Make ConjunctionDISI package-private and add ConjunctionUtils factory class (#148)
ConjunctionDISI is really an internal implementation of DocIdSetIterator,
and would ideally be package-private. However, it is used in a few
other places:
* directly in ConjunctionSpans
* as a utility in the facet and join modules

This commit adds a public helper class ConjunctionUtils that allows easy
intersection of iterators for use by other modules. This means that
ConjunctionDISI itself can become package-private. It also removes
a reference to Spans from core classes, which will make it easier to
migrate Spans to the queries module.  ConjuctionSpans and
ConjunctionIntervalIterator now use the public Utils class, and intervals
no longer need their own ConjunctionDISI implementation.
2021-05-25 12:07:20 +01:00
Mike McCandless 654e978190 LUCENE-9967: don't throw NullPointerException while handling a different root-cause exception in ReplicaNode.start 2021-05-24 10:51:26 -04:00
Dawid Weiss f7fbb9eda5 Add a small clarification about the required Java version for gradle. 2021-05-24 09:59:54 +02:00
Nhat Nguyen a12260eb95
LUCENE-9827: Update backward codec in Lucene 9.0 (#147)
We need to update the reading logic of the backward codec in Lucene 9 
for LUCENE-9827 and LUCENE-9935 as we have backported them to Lucene 8.

Relates apache/lucene-solr#2495
Relates apache/lucene-solr#2494
2021-05-20 08:49:43 -04:00
Houston Putman f919672647 LUCENE-9936: Add Gpg Signing help info to gradle help command 2021-05-19 10:43:31 -05:00
Greg Miller 693b6d3e34
move changes entry for backport to 8.9 (#145)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-19 07:04:23 -04:00
Greg Miller 65820e5170
LUCENE-9953: Make FacetResult#value accurate for LongValueFacetCounts multi-value doc cases (#131)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-18 12:37:53 -04:00
Greg Miller ade50f0796
LUCENE-9950: New facet counting implementation for general string doc value fields (#133)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-18 10:28:00 -04:00
Dawid Weiss ba9fee502b
LUCENE-9960: Avoid unnecessary top element replacement for equal elements in PriorityQueue. (#141) 2021-05-18 08:49:53 +02:00
Nhat Nguyen 406aef8a4b LUCENE-9935: Fix testRandomStoredFieldsWithIndexSort
Skip verifying if the list of live ids is empty
2021-05-16 18:00:04 -04:00
Nhat Nguyen eaaf13aa86 LUCENE-9935: Move CHANGES entry from 9.0 to 8.9 2021-05-16 16:54:12 -04:00
Dawid Weiss 0d05b21314
lucene/benchmarks: correct micro-standard.alg. (#71) 2021-05-14 11:07:25 -04:00
Adrien Grand 8045a170f4 LUCENE-9932: Spotless. 2021-05-14 13:58:42 +02:00
Adrien Grand 2c04ab5835 LUCENE-9958: Fixed performance regression for boolean queries that configure a minimum number of matching clauses. 2021-05-14 13:55:12 +02:00
Adrien Grand 8e94a591d8 LUCENE-9932: Fix test bug. 2021-05-14 09:51:12 +02:00
neoReMinD fd4b3c81d5
LUCENE-9932: Performance improvement for BKD index building (#91) 2021-05-14 09:33:43 +02:00
Robert Muir f215a55bc9
LUCENE-9827: move CHANGES.txt entry from 9.0 to 8.9 2021-05-13 12:37:58 -04:00
Nhat Nguyen 9a17d67658
LUCENE-9935: Enable bulk merge for stored fields with index sort (#134)
This commit enables bulk-merges (i.e., raw chunk copying) for stored 
fields when index sort is enabled
2021-05-12 21:00:18 -04:00
Gus Heck ad43841daf LUCENE-9575 add missing changes entry 2021-05-12 20:47:12 -04:00
Michael Wechner a9522c7179
LUCENE-9954: README for Luke (#135) 2021-05-13 00:53:53 +09:00
Jan Høydahl 7dd7077609
LUCENE-9929 NorwegianNormalizationFilter (#84) 2021-05-12 14:31:26 +02:00
Tomoko Uchida 6ebf959502
reorganize termvectors format description (javadocs). (#130) 2021-05-09 08:45:24 +09:00
Tomoko Uchida 891b192dcf
LUCENE-9456: revise format description of TermVectorsFormat (#129) 2021-05-07 08:27:07 +09:00
Robert Muir a7a02519f0
LUCENE-9843: Remove compression option on default codec's docvalues 2021-05-06 17:07:41 -04:00
Michael Sokolov e2788336d4
LUCENE-9905: PerFieldVectorFormat (#114)
* LUCENE-9905: PerFieldVectorFormat
2021-05-06 14:09:22 -04:00
Dawid Weiss aac6581f6e
LUCENE-9915: Add generation/ checksumming task for gen_ForUtil.py (#126) 2021-05-05 22:03:06 +02:00
Chris Hostetter a6cf46dada LUCENE-9936: Add gpg signing of the tgz & zip distribution files 2021-05-04 10:20:59 -07:00
Mayya Sharipova b5a77de512
Fix failures in TestPerFieldConsistency (#125)
This test assumes that there is no merging,
and was failing when there were merges.
This fixes the test but setting NoMergePolicy for
IndexWriter.

Relates to LUCENE-9334
Relates to #11
2021-05-04 09:51:55 -04:00
Tomoko Uchida c33d211d2a
LUCENE-4198: add format description for term impacts to javadocs (#115) 2021-05-04 10:45:54 +09:00
Greg Miller 650cad19a2
LUCENE-9948: Automatically detect multi- vs. single-valued cases in LongValueFacetCounts (#122)
The public API in LongValueFacetCounts previously required the user to specify whether-or-not a field being counted should be single- or multi-valued (i.e., is it NumericDocValues or SortedNumericDocValues). Since we can detect this automatically, it seems unnecessary to ask users to specify.

Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-03 11:18:38 -04:00
Ignacio Vera a91bde5104
LUCENE-9047: Write checksum as big endian in NRT replicator 2021-05-03 09:29:16 +02:00
Ignacio Vera b84e0c272b
LUCENE-9047: Directory API is now little endian 2021-05-03 07:49:56 +02:00
Dawid Weiss 8eb4eb2611
LUCENE-9909: add checksums of included files for some jflex generation tasks. Fix a task ordering issue with spotless. (#121)
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.

* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
2021-05-02 19:17:18 +02:00
Robert Muir 06907a2c12
LUCENE-9188: Add jacoco code coverage support to gradle (#119)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
2021-05-02 16:24:06 +02:00
Tomoko Uchida 0e8c3080da LUCENE-9947: embed project version in the launch script path 2021-05-01 20:04:54 +09:00
Tomoko Uchida 7acd3dd54a
LUCENE-9947: Exclude luke javadocs from the documentation site. (#120) 2021-05-01 18:10:56 +09:00
Tomoko Uchida 44a8d7ce39
LUCENE-9947: Exclude luke from the published jar list (#118) 2021-05-01 15:50:46 +09:00
balmukundblr 66062e8991
Add explicit flush to Lucene's benchmarks module (#116)
* Added a explicit Flush Task to flush data at Thread level once it completes the processing

* Included explicit flush per Thread level
2021-04-29 20:45:34 -04:00
Mayya Sharipova a9a3f6529d
Fix regression to account payloads while merging (#103)
Before PR#11, during merging if any merging segment has payloads
for a certain field, the new merged segment will also has payloads
set up for this field.

PR #11 introduced a bug where the first segment among merging
segments will define if the new merged segment will have
payloads. If the first segment doesn't have payloads, and
others do, the new merged segment mistakenly will not
have payloads set up.

This PR fixes this bug.

Relates to #11
2021-04-29 08:37:59 -04:00
Alan Woodward f7a3587091
LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks (#110)
DisjunctionMaxQuery stores its disjuncts in a Query[], and uses
Arrays.equals() for comparisons in its equals() implementation.
This means that the order in which disjuncts are added to the query
matters for equality checks.

This commit changes DMQ to instead store its disjuncts in a Multiset,
meaning that ordering no longer matters. The getDisjuncts()
method now returns a Collection<Query> rather than a List, and
some tests are changed to use query equality checks rather than
iterating over disjuncts and expecting a particular order.
2021-04-29 09:47:55 +01:00
Gus Heck 043ed3a91f
LUCENE-9572 adjust changes entry (#112) 2021-04-29 00:23:15 -04:00
Ayushman Singh Chauhan c49bfb8e01
DOC: Fix spelling (#111) 2021-04-28 13:19:34 -04:00
Alan Woodward 90d363ece7
LUCENE-9930: Only load Ukrainian morfologik dictionary once per JVM (#109)
The UkrainianMorfologikAnalyzer was reloading its dictionary every
time it created a new TokenStreamComponents, which meant that
while the analyzer was open it would hold onto one copy of the
dictionary per thread.

This commit loads the dictionary in a lazy static initializer, alongside
its stopword set. It also makes the normalizer charmap a singleton
so that we do not rebuild the same immutable object on every call
to initReader.
2021-04-28 13:51:23 +01:00