lucene

Commit Graph

Author	SHA1	Message	Date
Mayya Sharipova	52e2abc665	Temporarily mute BaseTermVectorsFormatTestCase::testMerge (#89 ) Relates to #11, and #86	2021-04-16 15:27:17 -04:00
Mayya Sharipova	49c7cc1197	Fix test that modifies schema (#87 ) LUCENE-9334 requires that docs have the same schema across the whole schema. This fixes the test that attempts to modify schema of "number" field from DocValues and Points to just DocValues. Relates to #11	2021-04-15 17:48:49 -04:00
Mayya Sharipova	9a346e3739	Temporarily mute TestLucene50TermVectorsFormat:testMerge (#86 ) Relates to #11	2021-04-15 11:43:29 -04:00
Ignacio Vera	873ac5f162	LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat (#72 )	2021-04-15 16:04:13 +02:00
Mayya Sharipova	d03662c48b	LUCENE-9334 Consistency of field data structures Require consistency between data-structures on a per-field basis A field must be indexed with the same index options and data-structures across all documents. Thus, for example, it is not allowed to have one document where a certain field is indexed with doc values and points, and another document where the same field is indexed only with points. But it is allowed for a document not to have a certain field at all. As a consequence of this, doc values updates are only applicable for fields that are indexed with doc values only.	2021-04-14 15:00:41 -04:00
Adrien Grand	79f14b1742	LUCENE-9387: Remove CodecReader#ramBytesUsed. (#79 ) This commit removes `ramBytesUsed()` from `CodecReader` and all file formats besides vectors, which is the only remaining file format that might use lots of memory in the default codec. I left `ramBytesUsed()` on the `completion` format too, which is another feature that could use lots of memory. Other components that relied on being able to compute memory usage of readers like facets' TaxonomyReader and the analyzing suggester assume that readers have a RAM usage of 0 now.	2021-04-14 14:37:54 +02:00
Greg Miller	fbbdc62913	LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR) (#69 ) Co-authored-by: Greg Miller <gmiller@amazon.com> Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-04-14 14:36:20 +02:00
Dawid Weiss	0b1d8ccba6	LUCENE-9925: add checksums to snowball-generated files (#80 )	2021-04-13 08:59:31 +02:00
Mike McCandless	b23e261786	LUCENE-9888: revert CheckIndex change that confirmed all segments have identical segment sort: it is too strict	2021-04-12 17:59:58 -04:00
Michael Sokolov	757da76919	Revert "LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable (#55 )" This reverts commit `e7de06eb51`.	2021-04-12 16:50:16 -04:00
Mike Drob	df0780843a	Add back-compat indices for 8.8.2	2021-04-12 15:07:30 -05:00
Mike Drob	68ccfb7d1e	Add bugfix version 8.8.2	2021-04-12 14:48:31 -05:00
Mike Drob	a2a68360ff	DOAP changes for release 8.8.2	2021-04-12 13:31:10 -05:00
Dawid Weiss	3f3917d504	LUCENE-9914: remove stale file.	2021-04-12 20:19:14 +02:00
Dawid Weiss	f91700a713	LUCENE-9914: Modernize Emoji regeneration scripts (#78 )	2021-04-12 20:16:43 +02:00
nitirajrathore	e7de06eb51	LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable (#55 )	2021-04-12 12:38:29 -04:00
Adrien Grand	a7b0aadcfc	LUCENE-9827: Propagate `numChunks` through bulk merges for term vectors as well. This commit also adds more checks about the values of `numChunks`, `numDirtyChunks` and `numDirtyDocs` that would have helped discover this issue earlier.	2021-04-12 09:44:35 +02:00
Robert Muir	9d15435b15	LUCENE-9916: add a simple regeneration help doc (#73 ) Add a simple regeneration help doc Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :) Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>	2021-04-11 11:28:41 -04:00
Robert Muir	b0bd64c620	LUCENE-9924: generate TLD list from IANA TLD db, rather than root zone db (#77 ) This adds a bit of simplicity as the file is a simple domain list, rather than a DNS zone. So the regexes parsing DNS can be removed. Also the file may change less often as it contains JUST the list of TLDs, and not any additional DNS metadata.	2021-04-11 11:25:15 -04:00
Robert Muir	f33335157d	LUCENE-9923: remove always-changing timestamp from ASCIITLD.jflex generation (#76 ) This makes regenerate idempotent by removing the new Date() from the output. We already have the root.zone's Last-Modified date, which is the one that matters and only changes when the root.zone changes.	2021-04-10 16:13:29 -04:00
Robert Muir	15bfb28d7f	LUCENE-9922: checksum files should use a deterministic sort order (#75 ) This way the files don't unnecessarily change, depending on filesystem order or anything else.	2021-04-10 16:00:55 -04:00
Dawid Weiss	4818a83cb2	LUCENE-9920: Remove binary gradle-wrapper.jar from the repository	2021-04-10 16:08:39 +02:00
Julie Tibshirani	c587677150	LUCENE-9705: Correct the format names in Lucene90StoredFieldsFormat (#74 ) We accidentally kept the old names when creating the new format.	2021-04-09 16:19:43 -07:00
Adrien Grand	e510ef11c2	LUCENE-9827: Propagate `numChunks` through bulk merges.	2021-04-08 16:45:52 +02:00
Uwe Schindler	779e00542c	Make the character printout code uniform (always print at least 4 hex chars)	2021-04-08 16:38:31 +02:00
Dawid Weiss	4c2384a1f3	LUCENE-9872: load input/output checksums prior to executing the target task, even if regenerate is not called.	2021-04-08 15:00:20 +02:00
Robert Muir	2971f311a2	LUCENE-9911: enable ecjLint unusedExceptionParameter (#70 ) Fails the linter if an exception is swallowed (e.g. variable completely unused). If this is intentional for some reason, the exception can simply by annotated with @SuppressWarnings("unused").	2021-04-08 08:19:01 -04:00
Peter Gromov	7f147fece0	LUCENE-9894: Hunspell: add user-friendly diagnostics for morph data API misuse (#51 )	2021-04-07 14:52:36 +02:00
Peter Gromov	8eb582e671	LUCENE-9895: Hunspell: make suggest-with-timeout API public (#54 )	2021-04-07 14:52:29 +02:00
Robert Muir	df25653cbd	LUCENE-9882: better synchronize eclipse formatter with spotless. (#47 ) Import the spotless formatting settings to our eclipse IDE setting, so that it is a closer match.	2021-04-07 06:20:42 -04:00
Robert Muir	4026753744	LUCENE-9910: maximize javac lint (#68 ) This enables quite a few javac warnings from java11+ that weren't enabled for some reason. None of them fail, so lock them in. Additionally some newer checks are only recognized for newer JDK versions, so they are only enabled based on the javac version used. They will cause no annoyance because they relate to newer language features.	2021-04-07 06:10:29 -04:00
Ignacio Vera	430b3baa80	LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat (#64 )	2021-04-07 11:33:49 +02:00
Dawid Weiss	39071dbc54	LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs	2021-04-07 10:56:21 +02:00
Gautam Worah	efeea0b8ee	LUCENE-9902 Minor fixes to the faceting API (#62 )	2021-04-06 14:50:23 -04:00
Robert Muir	be94a667f2	LUCENE-9827: avoid wasteful recompression for small segments (#28 ) Require that the segment has enough dirty documents to create a clean chunk before recompressing during merge, there must be at least maxChunkSize. This prevents wasteful recompression with small flushes (e.g. every document): we ensure recompression achieves some "permanent" progress. Expose maxDocsPerChunk as a parameter for Term vectors too, matching the stored fields format. This allows for easy testing. Increment numDirtyDocs for partially optimized merges: If segment N needs recompression, we have to flush any buffered docs before bulk-copying segment N+1. Don't just increment numDirtyChunks, also make sure numDirtyDocs is incremented, too. This doesn't have a performance impact, and is unrelated to tooDirty() improvements, but it is easier to reason about things with correct statistics in the index. Further tuning of how dirtiness is measured: for simplification just use percentage of dirty chunks. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-04-06 14:18:48 -04:00
Adrien Grand	d991fefb49	Add an example to the CacheHelper docs. (#50 )	2021-04-06 16:25:15 +02:00
Dawid Weiss	2662a74cab	Correct some of the jdk17-offending javadocs.	2021-04-05 20:34:52 +02:00
Dawid Weiss	2773172455	Correct some of the jdk17-offending javadocs.	2021-04-05 20:21:52 +02:00
Dawid Weiss	baceb16904	Correct some of the jdk17-offending javadocs.	2021-04-05 20:19:56 +02:00
Dawid Weiss	fbf9191abf	LUCENE-9901: UnicodeData.java has no regeneration task (#63 )	2021-04-05 20:12:56 +02:00
Ignacio Vera	67a0bd4b6d	LUCENE-9705: Final clean-up and entry in CHANGES.txt (#59 )	2021-04-04 11:30:47 +02:00
Dawid Weiss	010e3a1ba9	LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61 )	2021-04-02 11:46:43 +02:00
Dawid Weiss	e3ae57a3c1	LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60 )	2021-04-02 09:56:47 +02:00
Tomoko Uchida	670bbf8b99	Ignore sdkmanrc file on Git (#58 )	2021-04-02 01:04:14 +09:00
Ignacio Vera	8c9b9546cc	LUCENE-9705: Create Lucene90PointsFormat (#52 )	2021-04-01 07:04:04 +02:00
Pieter van Boxtel	1d579b9448	LUCENE-9898 Remove no longer used scorePayload method from BM25Similarity (#57 )	2021-04-01 09:06:03 +09:00
zacharymorn	79fcd99f4c	LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting (#56 )	2021-03-31 15:50:52 +09:00
Dawid Weiss	32e891c60f	LUCENE-9871: move dummy outputs aspect into a separate file.	2021-03-30 20:15:55 +02:00
Adrien Grand	10520185a9	LUCENE-9877: Move CHANGES entry under 8.9.	2021-03-30 15:13:00 +02:00
Greg Miller	fd79f9737a	LUCENE-9877: Allow up to 7 exceptions in PForUtil (instead of 3) (#48 ) Co-authored-by: Greg Miller <gmiller@amazon.com>	2021-03-30 15:11:33 +02:00

1 2 3 4 5 ...

35052 Commits All Branches Search

35052 Commits

All Branches