lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	8eb4eb2611	LUCENE-9909: add checksums of included files for some jflex generation tasks. Fix a task ordering issue with spotless. (#121 ) * LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files. * Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.	2021-05-02 19:17:18 +02:00
Robert Muir	06907a2c12	LUCENE-9188: Add jacoco code coverage support to gradle (#119 ) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com> Co-authored-by: Uwe Schindler <uschindler@apache.org>	2021-05-02 16:24:06 +02:00
Tomoko Uchida	0e8c3080da	LUCENE-9947: embed project version in the launch script path	2021-05-01 20:04:54 +09:00
Tomoko Uchida	7acd3dd54a	LUCENE-9947: Exclude luke javadocs from the documentation site. (#120 )	2021-05-01 18:10:56 +09:00
Tomoko Uchida	44a8d7ce39	LUCENE-9947: Exclude luke from the published jar list (#118 )	2021-05-01 15:50:46 +09:00
balmukundblr	66062e8991	Add explicit flush to Lucene's benchmarks module (#116 ) * Added a explicit Flush Task to flush data at Thread level once it completes the processing * Included explicit flush per Thread level	2021-04-29 20:45:34 -04:00
Mayya Sharipova	a9a3f6529d	Fix regression to account payloads while merging (#103 ) Before PR#11, during merging if any merging segment has payloads for a certain field, the new merged segment will also has payloads set up for this field. PR #11 introduced a bug where the first segment among merging segments will define if the new merged segment will have payloads. If the first segment doesn't have payloads, and others do, the new merged segment mistakenly will not have payloads set up. This PR fixes this bug. Relates to #11	2021-04-29 08:37:59 -04:00
Alan Woodward	f7a3587091	LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks (#110 ) DisjunctionMaxQuery stores its disjuncts in a Query[], and uses Arrays.equals() for comparisons in its equals() implementation. This means that the order in which disjuncts are added to the query matters for equality checks. This commit changes DMQ to instead store its disjuncts in a Multiset, meaning that ordering no longer matters. The getDisjuncts() method now returns a Collection<Query> rather than a List, and some tests are changed to use query equality checks rather than iterating over disjuncts and expecting a particular order.	2021-04-29 09:47:55 +01:00
Gus Heck	043ed3a91f	LUCENE-9572 adjust changes entry (#112 )	2021-04-29 00:23:15 -04:00
Ayushman Singh Chauhan	c49bfb8e01	DOC: Fix spelling (#111 )	2021-04-28 13:19:34 -04:00
Alan Woodward	90d363ece7	LUCENE-9930: Only load Ukrainian morfologik dictionary once per JVM (#109 ) The UkrainianMorfologikAnalyzer was reloading its dictionary every time it created a new TokenStreamComponents, which meant that while the analyzer was open it would hold onto one copy of the dictionary per thread. This commit loads the dictionary in a lazy static initializer, alongside its stopword set. It also makes the normalizer charmap a singleton so that we do not rebuild the same immutable object on every call to initReader.	2021-04-28 13:51:23 +01:00
Gus Heck	0c33e621f9	LUCENE-9574 adjust changes entry	2021-04-27 23:13:11 -04:00
Michael Sokolov	45bd06c804	LUCENE-9905: rename Lucene90VectorFormat and its reader and writer	2021-04-27 18:59:40 -04:00
Michael Sokolov	6d4b5eaba3	LUCENE-9905: rename VectorValues.SearchStrategy to VectorValues.SimilarityFunction	2021-04-27 16:18:58 -04:00
Julie Tibshirani	3115f85697	LUCENE-9908: Move VectorValues#search to LeafReader (#104 ) This PR removes `VectorValues#search` in favor of exposing NN search through `VectorReader#search` and `LeafReader#searchNearestVectors`. It also marks the vector methods on `LeafReader` as experimental.	2021-04-26 11:26:49 -07:00
Ignacio Vera	6b386e7e68	LUCENE-9047: Remove unnecessary ByteBuffersDataOutput in BKD writer (#102 )	2021-04-26 10:28:26 +02:00
Kai	1b1fd7206b	Use HTTPS for documentation link (#105 )	2021-04-24 09:19:16 -04:00
Robert Muir	044d152d95	LUCENE-9928: speed up analysis/icu regeneration (#82 ) The compilation of the library is slow, disable optimization as it doesn't speed up our usage of the gennorm2 tool. Use better heuristic for make parallelism (tests.jvms rather than just hardcoded value of four).	2021-04-22 07:24:44 -04:00
John Carlson	2c43f57f91	Update gradle to 6.8.3 (#100 )	2021-04-21 21:02:37 +02:00
Tomoko Uchida	5f5d1949e9	LUCENE-9353: revise format documentation of Lucene90BlockTreeTermsWriter (#90 )	2021-04-20 23:36:34 +09:00
Ignacio Vera	5592d582b8	LUCENE-9047: Adapt big endian dependent code to work in little endian	2021-04-20 10:55:19 +02:00
Ignacio Vera	e0436872c4	LUCENE-9907: Move PackedInts#getReaderNoHeader() to backwards codec	2021-04-20 09:09:38 +02:00
Ignacio Vera	b0662c807c	LUCENE-9907: Remove unused method PackedInts.Mutable#save	2021-04-19 14:52:21 +02:00
Ignacio Vera	2a7951cd30	LUCENE-9907: Remove unused methods in PackedInts (#94 )	2021-04-19 14:10:49 +02:00
Dawid Weiss	bd8f182b13	LUCENE-9933: Add non-file properties to wrapped regenerate checksums (#95 )	2021-04-19 13:37:47 +02:00
Ignacio Vera	936b3451af	LUCENE-9907: Remove unused BlockPackedReader (#93 )	2021-04-19 09:47:44 +02:00
Ignacio Vera	d15231709a	LUCENE-9907: Remove dependency on PackedInts#getReaderNoHeader in MonotonicBlockPackedReader (#85 )	2021-04-19 07:18:41 +02:00
Dawid Weiss	beafd113de	LUCENE-9931: Rename checksummed regen. tasks FooInternal and generated wrappers Foo (#88 )	2021-04-16 22:35:51 +02:00
Mayya Sharipova	52e2abc665	Temporarily mute BaseTermVectorsFormatTestCase::testMerge (#89 ) Relates to #11, and #86	2021-04-16 15:27:17 -04:00
Mayya Sharipova	49c7cc1197	Fix test that modifies schema (#87 ) LUCENE-9334 requires that docs have the same schema across the whole schema. This fixes the test that attempts to modify schema of "number" field from DocValues and Points to just DocValues. Relates to #11	2021-04-15 17:48:49 -04:00
Mayya Sharipova	9a346e3739	Temporarily mute TestLucene50TermVectorsFormat:testMerge (#86 ) Relates to #11	2021-04-15 11:43:29 -04:00
Ignacio Vera	873ac5f162	LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat (#72 )	2021-04-15 16:04:13 +02:00
Mayya Sharipova	d03662c48b	LUCENE-9334 Consistency of field data structures Require consistency between data-structures on a per-field basis A field must be indexed with the same index options and data-structures across all documents. Thus, for example, it is not allowed to have one document where a certain field is indexed with doc values and points, and another document where the same field is indexed only with points. But it is allowed for a document not to have a certain field at all. As a consequence of this, doc values updates are only applicable for fields that are indexed with doc values only.	2021-04-14 15:00:41 -04:00
Adrien Grand	79f14b1742	LUCENE-9387: Remove CodecReader#ramBytesUsed. (#79 ) This commit removes `ramBytesUsed()` from `CodecReader` and all file formats besides vectors, which is the only remaining file format that might use lots of memory in the default codec. I left `ramBytesUsed()` on the `completion` format too, which is another feature that could use lots of memory. Other components that relied on being able to compute memory usage of readers like facets' TaxonomyReader and the analyzing suggester assume that readers have a RAM usage of 0 now.	2021-04-14 14:37:54 +02:00
Greg Miller	fbbdc62913	LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR) (#69 ) Co-authored-by: Greg Miller <gmiller@amazon.com> Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-04-14 14:36:20 +02:00
Dawid Weiss	0b1d8ccba6	LUCENE-9925: add checksums to snowball-generated files (#80 )	2021-04-13 08:59:31 +02:00
Mike McCandless	b23e261786	LUCENE-9888: revert CheckIndex change that confirmed all segments have identical segment sort: it is too strict	2021-04-12 17:59:58 -04:00
Michael Sokolov	757da76919	Revert "LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable (#55 )" This reverts commit `e7de06eb51`.	2021-04-12 16:50:16 -04:00
Mike Drob	df0780843a	Add back-compat indices for 8.8.2	2021-04-12 15:07:30 -05:00
Mike Drob	68ccfb7d1e	Add bugfix version 8.8.2	2021-04-12 14:48:31 -05:00
Mike Drob	a2a68360ff	DOAP changes for release 8.8.2	2021-04-12 13:31:10 -05:00
Dawid Weiss	3f3917d504	LUCENE-9914: remove stale file.	2021-04-12 20:19:14 +02:00
Dawid Weiss	f91700a713	LUCENE-9914: Modernize Emoji regeneration scripts (#78 )	2021-04-12 20:16:43 +02:00
nitirajrathore	e7de06eb51	LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable (#55 )	2021-04-12 12:38:29 -04:00
Adrien Grand	a7b0aadcfc	LUCENE-9827: Propagate `numChunks` through bulk merges for term vectors as well. This commit also adds more checks about the values of `numChunks`, `numDirtyChunks` and `numDirtyDocs` that would have helped discover this issue earlier.	2021-04-12 09:44:35 +02:00
Robert Muir	9d15435b15	LUCENE-9916: add a simple regeneration help doc (#73 ) Add a simple regeneration help doc Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>	2021-04-11 11:28:41 -04:00
Robert Muir	b0bd64c620	LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77 ) This adds a bit of simplicity as the file is a simple domain list, rather than a DNS zone. So the regexes parsing DNS can be removed. Also the file may change less often as it contains JUST the list of TLDs, and not any additional DNS metadata.	2021-04-11 11:25:15 -04:00
Robert Muir	f33335157d	LUCENE-9923: remove always-changing timestamp from ASCIITLD.jflex generation (#76 ) This makes regenerate idempotent by removing the new Date() from the output. We already have the root.zone's Last-Modified date, which is the one that matters and only changes when the root.zone changes.	2021-04-10 16:13:29 -04:00
Robert Muir	15bfb28d7f	LUCENE-9922: checksum files should use a deterministic sort order (#75 ) This way the files don't unnecessarily change, depending on filesystem order or anything else.	2021-04-10 16:00:55 -04:00
Dawid Weiss	4818a83cb2	LUCENE-9920: Remove binary gradle-wrapper.jar from the repository	2021-04-10 16:08:39 +02:00

1 2 3 4 5 ...

35080 Commits All Branches Search

35080 Commits

All Branches