35247 Commits

Author SHA1 Message Date
neoReMinD
fd4b3c81d5
LUCENE-9932: Performance improvement for BKD index building (#91) 2021-05-14 09:33:43 +02:00
Robert Muir
f215a55bc9
LUCENE-9827: move CHANGES.txt entry from 9.0 to 8.9 2021-05-13 12:37:58 -04:00
Nhat Nguyen
9a17d67658
LUCENE-9935: Enable bulk merge for stored fields with index sort (#134)
This commit enables bulk-merges (i.e., raw chunk copying) for stored 
fields when index sort is enabled
2021-05-12 21:00:18 -04:00
Gus Heck
ad43841daf LUCENE-9575 add missing changes entry 2021-05-12 20:47:12 -04:00
Michael Wechner
a9522c7179
LUCENE-9954: README for Luke (#135) 2021-05-13 00:53:53 +09:00
Jan Høydahl
7dd7077609
LUCENE-9929 NorwegianNormalizationFilter (#84) 2021-05-12 14:31:26 +02:00
Tomoko Uchida
6ebf959502
reorganize termvectors format description (javadocs). (#130) 2021-05-09 08:45:24 +09:00
Tomoko Uchida
891b192dcf
LUCENE-9456: revise format description of TermVectorsFormat (#129) 2021-05-07 08:27:07 +09:00
Robert Muir
a7a02519f0
LUCENE-9843: Remove compression option on default codec's docvalues 2021-05-06 17:07:41 -04:00
Michael Sokolov
e2788336d4
LUCENE-9905: PerFieldVectorFormat (#114)
* LUCENE-9905: PerFieldVectorFormat
2021-05-06 14:09:22 -04:00
Dawid Weiss
aac6581f6e
LUCENE-9915: Add generation/ checksumming task for gen_ForUtil.py (#126) 2021-05-05 22:03:06 +02:00
Chris Hostetter
a6cf46dada LUCENE-9936: Add gpg signing of the tgz & zip distribution files 2021-05-04 10:20:59 -07:00
Mayya Sharipova
b5a77de512
Fix failures in TestPerFieldConsistency (#125)
This test assumes that there is no merging,
and was failing when there were merges.
This fixes the test but setting NoMergePolicy for
IndexWriter.

Relates to LUCENE-9334
Relates to #11
2021-05-04 09:51:55 -04:00
Tomoko Uchida
c33d211d2a
LUCENE-4198: add format description for term impacts to javadocs (#115) 2021-05-04 10:45:54 +09:00
Greg Miller
650cad19a2
LUCENE-9948: Automatically detect multi- vs. single-valued cases in LongValueFacetCounts (#122)
The public API in LongValueFacetCounts previously required the user to specify whether-or-not a field being counted should be single- or multi-valued (i.e., is it NumericDocValues or SortedNumericDocValues). Since we can detect this automatically, it seems unnecessary to ask users to specify.

Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-03 11:18:38 -04:00
Ignacio Vera
a91bde5104
LUCENE-9047: Write checksum as big endian in NRT replicator 2021-05-03 09:29:16 +02:00
Ignacio Vera
b84e0c272b
LUCENE-9047: Directory API is now little endian 2021-05-03 07:49:56 +02:00
Dawid Weiss
8eb4eb2611
LUCENE-9909: add checksums of included files for some jflex generation tasks. Fix a task ordering issue with spotless. (#121)
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.

* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
2021-05-02 19:17:18 +02:00
Robert Muir
06907a2c12
LUCENE-9188: Add jacoco code coverage support to gradle (#119)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
2021-05-02 16:24:06 +02:00
Tomoko Uchida
0e8c3080da LUCENE-9947: embed project version in the launch script path 2021-05-01 20:04:54 +09:00
Tomoko Uchida
7acd3dd54a
LUCENE-9947: Exclude luke javadocs from the documentation site. (#120) 2021-05-01 18:10:56 +09:00
Tomoko Uchida
44a8d7ce39
LUCENE-9947: Exclude luke from the published jar list (#118) 2021-05-01 15:50:46 +09:00
balmukundblr
66062e8991
Add explicit flush to Lucene's benchmarks module (#116)
* Added a explicit Flush Task to flush data at Thread level once it completes the processing

* Included explicit flush per Thread level
2021-04-29 20:45:34 -04:00
Mayya Sharipova
a9a3f6529d
Fix regression to account payloads while merging (#103)
Before PR#11, during merging if any merging segment has payloads
for a certain field, the new merged segment will also has payloads
set up for this field.

PR #11 introduced a bug where the first segment among merging
segments will define if the new merged segment will have
payloads. If the first segment doesn't have payloads, and
others do, the new merged segment mistakenly will not
have payloads set up.

This PR fixes this bug.

Relates to #11
2021-04-29 08:37:59 -04:00
Alan Woodward
f7a3587091
LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks (#110)
DisjunctionMaxQuery stores its disjuncts in a Query[], and uses
Arrays.equals() for comparisons in its equals() implementation.
This means that the order in which disjuncts are added to the query
matters for equality checks.

This commit changes DMQ to instead store its disjuncts in a Multiset,
meaning that ordering no longer matters. The getDisjuncts()
method now returns a Collection<Query> rather than a List, and
some tests are changed to use query equality checks rather than
iterating over disjuncts and expecting a particular order.
2021-04-29 09:47:55 +01:00
Gus Heck
043ed3a91f
LUCENE-9572 adjust changes entry (#112) 2021-04-29 00:23:15 -04:00
Ayushman Singh Chauhan
c49bfb8e01
DOC: Fix spelling (#111) 2021-04-28 13:19:34 -04:00
Alan Woodward
90d363ece7
LUCENE-9930: Only load Ukrainian morfologik dictionary once per JVM (#109)
The UkrainianMorfologikAnalyzer was reloading its dictionary every
time it created a new TokenStreamComponents, which meant that
while the analyzer was open it would hold onto one copy of the
dictionary per thread.

This commit loads the dictionary in a lazy static initializer, alongside
its stopword set. It also makes the normalizer charmap a singleton
so that we do not rebuild the same immutable object on every call
to initReader.
2021-04-28 13:51:23 +01:00
Gus Heck
0c33e621f9 LUCENE-9574 adjust changes entry 2021-04-27 23:13:11 -04:00
Michael Sokolov
45bd06c804 LUCENE-9905: rename Lucene90VectorFormat and its reader and writer 2021-04-27 18:59:40 -04:00
Michael Sokolov
6d4b5eaba3 LUCENE-9905: rename VectorValues.SearchStrategy to VectorValues.SimilarityFunction 2021-04-27 16:18:58 -04:00
Julie Tibshirani
3115f85697
LUCENE-9908: Move VectorValues#search to LeafReader (#104)
This PR removes `VectorValues#search` in favor of exposing NN search through
`VectorReader#search` and `LeafReader#searchNearestVectors`. It also marks the
vector methods on `LeafReader` as experimental.
2021-04-26 11:26:49 -07:00
Ignacio Vera
6b386e7e68
LUCENE-9047: Remove unnecessary ByteBuffersDataOutput in BKD writer (#102) 2021-04-26 10:28:26 +02:00
Kai
1b1fd7206b
Use HTTPS for documentation link (#105) 2021-04-24 09:19:16 -04:00
Robert Muir
044d152d95
LUCENE-9928: speed up analysis/icu regeneration (#82)
The compilation of the library is slow, disable optimization as it doesn't speed up our usage of the gennorm2 tool.
Use better heuristic for make parallelism (tests.jvms rather than just hardcoded value of four).
2021-04-22 07:24:44 -04:00
John Carlson
2c43f57f91
Update gradle to 6.8.3 (#100) 2021-04-21 21:02:37 +02:00
Tomoko Uchida
5f5d1949e9
LUCENE-9353: revise format documentation of Lucene90BlockTreeTermsWriter (#90) 2021-04-20 23:36:34 +09:00
Ignacio Vera
5592d582b8
LUCENE-9047: Adapt big endian dependent code to work in little endian 2021-04-20 10:55:19 +02:00
Ignacio Vera
e0436872c4
LUCENE-9907: Move PackedInts#getReaderNoHeader() to backwards codec 2021-04-20 09:09:38 +02:00
Ignacio Vera
b0662c807c
LUCENE-9907: Remove unused method PackedInts.Mutable#save 2021-04-19 14:52:21 +02:00
Ignacio Vera
2a7951cd30
LUCENE-9907: Remove unused methods in PackedInts (#94) 2021-04-19 14:10:49 +02:00
Dawid Weiss
bd8f182b13
LUCENE-9933: Add non-file properties to wrapped regenerate checksums (#95) 2021-04-19 13:37:47 +02:00
Ignacio Vera
936b3451af
LUCENE-9907: Remove unused BlockPackedReader (#93) 2021-04-19 09:47:44 +02:00
Ignacio Vera
d15231709a
LUCENE-9907: Remove dependency on PackedInts#getReaderNoHeader in MonotonicBlockPackedReader (#85) 2021-04-19 07:18:41 +02:00
Dawid Weiss
beafd113de
LUCENE-9931: Rename checksummed regen. tasks FooInternal and generated wrappers Foo (#88) 2021-04-16 22:35:51 +02:00
Mayya Sharipova
52e2abc665
Temporarily mute BaseTermVectorsFormatTestCase::testMerge (#89)
Relates to #11, and #86
2021-04-16 15:27:17 -04:00
Mayya Sharipova
49c7cc1197
Fix test that modifies schema (#87)
LUCENE-9334 requires that docs have the same schema
across the whole schema.
This fixes the test that attempts to modify schema of "number" field
from DocValues and Points to just DocValues.

Relates to #11
2021-04-15 17:48:49 -04:00
Mayya Sharipova
9a346e3739
Temporarily mute TestLucene50TermVectorsFormat:testMerge (#86)
Relates to #11
2021-04-15 11:43:29 -04:00
Ignacio Vera
873ac5f162
LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat (#72) 2021-04-15 16:04:13 +02:00
Mayya Sharipova
d03662c48b
LUCENE-9334 Consistency of field data structures
Require consistency between data-structures on a per-field basis

A field must be indexed with the same index options and data-structures across
all documents. Thus, for example, it is not allowed to have one document
where a certain field is indexed with doc values and points, and another document 
where the same field is indexed only with points. 
But it is allowed for a document not to have a certain field at all.

As a consequence of this, doc values updates are
only applicable for fields that are indexed with doc values only.
2021-04-14 15:00:41 -04:00