lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	20cb6817db	LUCENE-10234: Change module prefix to org.apache.* (#487 )	2021-11-30 22:03:33 +01:00
Robert Muir	5d18596d3d	LUCENE-10248: add CHANGES.txt entry	2021-11-30 15:57:24 -05:00
Xavier Sanchez Loro	edb936f090	LUCENE-10248: Spanish Plural Stemmer (#461 ) Adds a new Spanish stemmer just for stemming plural to singular whilst maintaining gender: the SpanishPluralStemmer. The goal is to provide a lightweight algorithmic approach with better precision and recall than current approaches. See blog post for more details: https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373 This approach is based on rules specified in WikiLingua: http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n) Some characteristics: * Designed to stem just plural to singular form * Distinguishes between masculine and feminine forms * It will increase recall but precision can be reduced depending on the use case/information need * Stems plural words of foreign origin: i.e. complots, bits, punks, robots * Support for invariant words: same plural and singular form or plural does not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc * Support for special cases: i.e. yoes, clubes, itemes, faralaes * Use it when the distinction between singular and plural is not relevant but gender is relevant * Produces meaningful tokens in form of singular * Not strange stems like “amig”: it’s true that stemmers must not generate grammatically correct tokens, but if we generate correct stems we decrease the possibility of collisions with other words	2021-11-30 15:51:10 -05:00
Greg Miller	f48a430f35	LUCENE-10232: Fix MultiRangeQuery to confirm all dimensions for a given range match (#437 )	2021-11-30 11:58:38 -08:00
Robert Muir	46a5a57724	LUCENE-10272: cross-check norms with postings in checkindex (#493 ) Previously, CheckIndex would iterate norms and validate each one. But if norms that should be there were missing, nothing would fail. Now it computes an expected count of norms and ensures it saw them all.	2021-11-30 14:21:40 -05:00
Alan Woodward	749b744c0c	LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery (#477 ) If all documents in the segment have a value, then `Reader.getDocCount()` will equal `maxDoc` and we can return `numDocs` as a shortcut.	2021-11-30 10:00:38 +00:00
Greg Miller	4f5b41a71c	Add javadoc note in DoubleValuesSource (see LUCENE-10258) (#490 )	2021-11-29 18:00:52 -08:00
Robert Muir	453168ec76	support tables in generated html documentation (#489 ) Tables can be used in markdown (e.g. MIGRATE.md) and will become html tables in our generated HTML docs on the website.	2021-11-29 17:38:14 -05:00
Robert Muir	5aa9da9ead	Improve MIGRATE.md around analyzers artifacts. (#488 ) * Improve MIGRATE.md around analyzers artifacts. Move this to the very top of MIGRATE, the user needs to first be able to pull in the artifacts, before doing anything else like trying to compile, deal with renamed classes, etc. Add a table of each package that got moved, with explicit old and new names. Hopefully it helps search engines and users. Link to MIGRATE.md explicitly from README.md	2021-11-29 17:04:15 -05:00
Ignacio Vera	78c8d7b7ea	LUCENE-9538: Detect polygon self-intersections in the Tessellator (#428 ) Detect self-intersections so it can provide a more meaningful error to the users.	2021-11-29 11:05:03 +01:00
Ignacio Vera	634c22c527	LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478 ) Fixes a race condition introduced in LUCENE-9820.	2021-11-29 09:20:20 +01:00
Robert Muir	63c89f678d	Speed up ECJ tasks by avoiding --release (#484 ) LUCENE-10185 caused a large performance regression in ECJ tasks by using the --release flag. Instead of using --release, we can just disable "terminal deprecation", and leave this check to `javac`. The --release flag makes this tool run 50% slower.	2021-11-28 15:10:32 -05:00
Robert Muir	1fb45da7bb	upgrade ecj linter from 3.25.0 -> 3.27.0 (#483 ) The newest version has a significant performance increase for our use-case.	2021-11-28 12:05:19 -05:00
Robert Muir	3772ff563a	speed up extremely slow test methods (runtime 15-30s) (#471 )	2021-11-28 09:40:43 -05:00
Tomoko Uchida	cb5f1b6ca0	Use the same analysis chain to StandardAnalyzer (a follow-up of #480 ) (#482 )	2021-11-28 21:22:28 +09:00
Tomoko Uchida	c041517304	set group to 'run' benchmark task (#481 )	2021-11-28 21:22:07 +09:00
Tomoko Uchida	9eb7857199	fix typo in documentation	2021-11-28 10:11:49 +09:00
Uwe Schindler	aed47c1862	Fix wrong path in documentation	2021-11-28 00:55:28 +01:00
Tomoko Uchida	57f695b14d	LUCENE-10261: clean up reflection stuff in luke module and make minor adjustments (#480 )	2021-11-27 15:36:38 +09:00
Dawid Weiss	1029651d12	Don't log warnings from ant (different class loader, I guess). Makes Alan happier.	2021-11-26 11:39:55 +01:00
Dawid Weiss	651755aab7	LUCENE-10260: Luke's about window no longer shows version number (#473 )	2021-11-26 08:32:23 +01:00
Ignacio Vera	a590c6d2a0	LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree (#476 ) This change allows random navigation of a PointValues#PointTree.	2021-11-26 07:42:13 +01:00
Uwe Schindler	d973e50c15	LUCENE-10259: Fix startup scripts to allow whitespace in path names and use /bin/sh only (#472 )	2021-11-25 16:07:23 +01:00
Tomoko Uchida	40b38438c8	LUCENE-10261: Remove preset analyzer panel from Luke Analysis UI. (#475 )	2021-11-25 20:30:36 +09:00
Ignacio Vera	800f002e44	LUCENE-9820: PointTree#size() should handle the case of balanced tree in pre-8.6 indexes (#462 ) Handle properly the case where trees are fully balanced for number of dimension > 1	2021-11-25 11:03:16 +01:00
Adrien Grand	8710252116	Fix test failures with testIndexUpgraderCommandLineArgs and ExtraFS.	2021-11-25 08:51:56 +01:00
Adrien Grand	f80d816ce7	Speed up TestBackwardsCompatibility#testCommandLineArgs. (#467 ) This test unzip files that we already unzipped. This commit copies the already uncompressed files instead.	2021-11-24 08:25:22 +01:00
Adrien Grand	24fcd80a37	LUCENE-10168: Only test N-2 codecs on nightly runs. (#466 ) In order for tests to keep running fast, this annotates all tests of N-2 codecs with `@Nightly`. To keep good coverage of releases, the smoke tester is now configured to run nightly tests.	2021-11-24 08:20:04 +01:00
Greg Miller	6ee69e06fb	LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals (#264 )	2021-11-23 06:00:11 -08:00
David Smiley	0fcf9c825f	Javadocs, Sorter impls (#426 ) * Javadocs, Sorter impls * clarify which sorts are stable/not * link from utility methods to the primary Sorter implementations for further information * describe when InPlaceMergeSorter is useful. Fix incorrect statement that is uses insertion sort. * Javadocs for Sorter	2021-11-23 07:13:40 -05:00
Tomoko Uchida	4193bcbc02	LUCENE-10200: fix luke lauch script.	2021-11-22 18:46:28 +09:00
Greg Miller	78ee53f837	Add missing CHANGES entry	2021-11-21 07:41:25 -08:00
Greg Miller	9d7e5ef388	Fixup TestCombinedFieldQuery to not (randomy) use numHits = 0	2021-11-21 07:38:28 -08:00
Andriy Redko	5993b9050a	LUCENE-10244: Please consider opening MultiCollector::getCollectors for public use (#455 )	2021-11-21 07:36:54 -08:00
Adrien Grand	0902d803fd	Add dash between `rev` and the git hash.	2021-11-20 08:09:42 +01:00
Quentin Pradet	1a869c185b	LUCENE-10085: Implement Weight#count on DocValuesFieldExistsQuery (#445 ) Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-11-19 18:06:58 +01:00
Robert Muir	af831d2810	LUCENE-10239: upgrade jflex (1.7.0 -> 1.8.2) (#452 ) Upgrade jflex. Change doesn't alter the behavior of any of the analyzers (unicode version or grammar refactorings), just the minimal to get new tooling working.	2021-11-19 09:24:27 -05:00
Ignacio Vera	ad911df260	LUCENE-9820: Separate logic for reading the BKD index from logic to intersecting it (#7 ) Extract BKD tree interface and move intersecting logic to the PointValues abstract class.	2021-11-19 08:28:01 +01:00
zacharymorn	07ee3ba83a	LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (#444 )	2021-11-18 21:36:38 -08:00
Andriy Redko	6bd5c14bf3	LUCENE-10242: The TopScoreDocCollector::createSharedManager should use ScoreDoc instead of FieldDoc (#450 ) Signed-off-by: Andriy Redko <andriy.redko@aiven.io>	2021-11-18 16:35:59 +01:00
Patrick Zhai	b4476e4318	LUCENE-10122 Use NumericDocValue to store taxonomy parent array instead of custom term positions (#451 )	2021-11-17 19:32:34 -05:00
Dawid Weiss	bae095ae48	LUCENE-10240: gradle regenerate fails on java 17 (#449 )	2021-11-17 18:36:34 +01:00
Dawid Weiss	0eeba8d37c	LUCENE-10238: Update icu4j to 70.1. (#447 )	2021-11-17 18:13:40 +01:00
Adrien Grand	556c7c5fb5	Add back-compat indices for 8.11.0.	2021-11-17 11:53:49 +01:00
Bruno Roustant	c71cbac4f9	LUCENE-10225: Improve IntroSelector with 3-way partitioning.	2021-11-17 10:38:27 +01:00
Adrien Grand	c0112dd2ff	DOAP changes for release 8.11.0	2021-11-16 10:54:24 +01:00
Dawid Weiss	f5e5cf008a	LUCENE-10234: Add automatic module name to JAR manifests. (#440 )	2021-11-15 17:02:40 +01:00
Quentin Pradet	1e5e997880	LUCENE-10085: Rename DocValuesFieldExistsQuery test (#441 ) FieldValueQuery got renamed to DocValuesFieldExistsQuery but the test wasn't renamed.	2021-11-15 16:24:29 +01:00
Julie Tibshirani	3b914a4d73	LUCENE-10069: Document that kNN queries might not return all results (#434 ) Performing a kNN search with very large k may return fewer than k documents. This is due to the fact that the HNSW graph is not guaranteed to be connected. This commit documents the behavior as part of a general warning that the results of a kNN search may be approximate.	2021-11-12 14:19:20 -08:00
Julie Tibshirani	2a9adb81df	LUCENE-10063: Fix score calculation in SimpleTextKnnVectorsFormat The method VectorSimilarityFunction#convertToScore already reverses the similarity, so we shouldn't reverse it again.	2021-11-11 11:22:03 -08:00

1 2 3 4 5 ...

35575 Commits All Branches Search

35575 Commits

All Branches