lucene

Commit Graph

Author	SHA1	Message	Date
Xavier Sanchez Loro	0293fc896d	LUCENE-10248: Spanish Plural Stemmer (#461 ) Adds a new Spanish stemmer just for stemming plural to singular whilst maintaining gender: the SpanishPluralStemmer. The goal is to provide a lightweight algorithmic approach with better precision and recall than current approaches. See blog post for more details: https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373 This approach is based on rules specified in WikiLingua: http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n) Some characteristics: * Designed to stem just plural to singular form * Distinguishes between masculine and feminine forms * It will increase recall but precision can be reduced depending on the use case/information need * Stems plural words of foreign origin: i.e. complots, bits, punks, robots * Support for invariant words: same plural and singular form or plural does not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc * Support for special cases: i.e. yoes, clubes, itemes, faralaes * Use it when the distinction between singular and plural is not relevant but gender is relevant * Produces meaningful tokens in form of singular * Not strange stems like “amig”: it’s true that stemmers must not generate grammatically correct tokens, but if we generate correct stems we decrease the possibility of collisions with other words	2021-11-30 16:07:09 -05:00
Robert Muir	468aceff0c	LUCENE-10248: add CHANGES.txt entry	2021-11-30 16:07:09 -05:00
Dawid Weiss	26257292c3	LUCENE-10234: Change module prefix to org.apache.* (#487 )	2021-11-30 22:04:24 +01:00
Robert Muir	c89c78cee0	LUCENE-10272: cross-check norms with postings in checkindex (#493 ) Previously, CheckIndex would iterate norms and validate each one. But if norms that should be there were missing, nothing would fail. Now it computes an expected count of norms and ensures it saw them all.	2021-11-30 15:40:35 -05:00
Greg Miller	51e023bf7a	LUCENE-10232: Fix MultiRangeQuery to confirm all dimensions for a given range match	2021-11-30 12:01:34 -08:00
Alan Woodward	b697745407	LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery (#477 ) If all documents in the segment have a value, then `Reader.getDocCount()` will equal `maxDoc` and we can return `numDocs` as a shortcut.	2021-11-30 10:07:39 +00:00
Greg Miller	8a03d2ffc9	Add javadoc note in DoubleValuesSource (see LUCENE-10258) (#490 )	2021-11-29 18:02:26 -08:00
Robert Muir	c5b5fd641b	support tables in generated html documentation (#489 ) Tables can be used in markdown (e.g. MIGRATE.md) and will become html tables in our generated HTML docs on the website.	2021-11-29 17:44:26 -05:00
Robert Muir	278316377c	Improve MIGRATE.md around analyzers artifacts. (#488 ) * Improve MIGRATE.md around analyzers artifacts. Move this to the very top of MIGRATE, the user needs to first be able to pull in the artifacts, before doing anything else like trying to compile, deal with renamed classes, etc. Add a table of each package that got moved, with explicit old and new names. Hopefully it helps search engines and users. Link to MIGRATE.md explicitly from README.md	2021-11-29 17:44:26 -05:00
Ignacio Vera	70243ea811	LUCENE-9538: Detect polygon self-intersections in the Tessellator (#428 ) Detect self-intersections so it can provide a more meaningful error to the users.	2021-11-29 11:06:06 +01:00
Ignacio Vera	62084d7138	LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478 ) Fixes a race condition introduced in LUCENE-9820.	2021-11-29 09:21:27 +01:00
Robert Muir	8d0103724d	Speed up ECJ tasks by avoiding --release (#484 ) LUCENE-10185 caused a large performance regression in ECJ tasks by using the --release flag. Instead of using --release, we can just disable "terminal deprecation", and leave this check to `javac`. The --release flag makes this tool run 50% slower.	2021-11-28 15:11:02 -05:00
Robert Muir	95095d0d49	upgrade ecj linter from 3.25.0 -> 3.27.0 (#483 ) The newest version has a significant performance increase for our use-case.	2021-11-28 12:05:42 -05:00
Robert Muir	756550f88b	speed up extremely slow test methods (runtime 15-30s) (#471 )	2021-11-28 09:41:15 -05:00
Tomoko Uchida	38762ee8cf	Use the same analysis chain to StandardAnalyzer (a follow-up of #480 ) (#482 )	2021-11-28 21:24:11 +09:00
Tomoko Uchida	93bb52c601	set group to 'run' benchmark task (#481 )	2021-11-28 21:23:48 +09:00
Tomoko Uchida	eb912a9158	fix typo in documentation	2021-11-28 10:13:28 +09:00
Uwe Schindler	92a2428906	Fix wrong path in documentation	2021-11-28 00:56:27 +01:00
Tomoko Uchida	e222031943	LUCENE-10261: clean up reflection stuff in luke module and make minor adjustments (#480 )	2021-11-27 15:42:16 +09:00
Dawid Weiss	f599a8e2ee	LUCENE-10260: Luke's about window no longer shows version number (#473 )	2021-11-26 08:33:30 +01:00
Ignacio Vera	cb0c2b87ed	LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree (#476 ) This change allows random navigation of a PointValues#PointTree.	2021-11-26 07:43:43 +01:00
Uwe Schindler	30c4b8d5b8	LUCENE-10259: Fix startup scripts to allow whitespace in path names and use /bin/sh only (#472 )	2021-11-25 16:11:37 +01:00
Tomoko Uchida	bfa3f01a17	LUCENE-10261: Remove preset analyzer panel from Luke Analysis UI. (#475 )	2021-11-25 20:33:34 +09:00
Ignacio Vera	58ef7d911a	LUCENE-9820: PointTree#size() should handle the case of balanced tree in pre-8.6 indexes (#462 ) (#474 ) Handle properly the case where trees are fully balanced for number of dimension > 1	2021-11-25 11:19:02 +01:00
Adrien Grand	a97a1e2815	Fix test failures with testIndexUpgraderCommandLineArgs and ExtraFS.	2021-11-25 08:50:27 +01:00
David Smiley	e2e99da4a8	Javadocs, Sorter impls (#426 ) * Javadocs, Sorter impls * clarify which sorts are stable/not * link from utility methods to the primary Sorter implementations for further information * describe when InPlaceMergeSorter is useful. Fix incorrect statement that is uses insertion sort. * Javadocs for Sorter	2021-11-25 00:44:58 -05:00
Adrien Grand	b3a36166a5	Speed up TestBackwardsCompatibility#testCommandLineArgs. (#467 ) This test unzip files that we already unzipped. This commit copies the already uncompressed files instead.	2021-11-24 08:26:42 +01:00
Adrien Grand	3f634e2ab9	LUCENE-10168: Only test N-2 codecs on nightly runs. In order for tests to keep running fast, this annotates all tests of N-2 codecs with `@Nightly`. To keep good coverage of releases, the smoke tester is now configured to run nightly tests.	2021-11-24 08:26:42 +01:00
Tomoko Uchida	170137129a	LUCENE-10200: fix luke lauch script.	2021-11-22 19:16:18 +09:00
Andriy Redko	51c37db005	LUCENE-10244: MultiCollector::getCollectors is now public	2021-11-21 07:44:08 -08:00
Adrien Grand	ee8829da5b	Add dash between `rev` and the git hash.	2021-11-20 08:09:33 +01:00
Greg Miller	0ba310782f	LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals	2021-11-19 13:11:42 -08:00
Patrick Zhai	6b99f03cdd	LUCENE-10122 Use NumericDocValue to store taxonomy parent array (#454 )	2021-11-19 13:05:56 -05:00
Quentin Pradet	631d1ad749	LUCENE-10085: Implement Weight#count on DocValuesFieldExistsQuery (#445 ) Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-11-19 18:07:29 +01:00
Robert Muir	ee56d31425	LUCENE-10239: upgrade jflex (1.7.0 -> 1.8.2) (#452 ) Upgrade jflex. Change doesn't alter the behavior of any of the analyzers (unicode version or grammar refactorings), just the minimal to get new tooling working.	2021-11-19 09:28:11 -05:00
Ignacio Vera	9adf7e27f9	LUCENE-9820: Separate logic for reading the BKD index from logic to intersecting it (#7 ) (#457 ) Extract BKD tree interface and move intersecting logic to the PointValues abstract class.	2021-11-19 08:39:28 +01:00
Jim Ferenczi	2e5c4bb5a5	LUCENE-10208: Ensure that the minimum competitive score does not decrease in concurrent search (#431 ) Co-authored-by: Adrien Grand <jpountz@gmail.com>	2021-11-18 17:33:04 +01:00
Andriy Redko	42bee6f223	LUCENE-10242: The TopScoreDocCollector::createSharedManager should use ScoreDoc instead of FieldDoc (#450 ) Signed-off-by: Andriy Redko <andriy.redko@aiven.io>	2021-11-18 16:36:32 +01:00
Dawid Weiss	8d07018050	LUCENE-10240: gradle regenerate fails on java 17 (#449 )	2021-11-17 18:36:58 +01:00
Dawid Weiss	4c22d30f80	LUCENE-10238: Update icu4j to 70.1. (#447 )	2021-11-17 18:14:33 +01:00
Adrien Grand	7ce0cfa9c5	Add back-compat indices for 8.11.0	2021-11-17 11:51:18 +01:00
Bruno Roustant	02a63f688c	LUCENE-10225: Improve IntroSelector with 3-way partitioning.	2021-11-17 11:31:11 +01:00
Adrien Grand	b6f456573a	DOAP changes for release 8.11.0	2021-11-16 10:55:08 +01:00
Dawid Weiss	9d0eb88d2c	LUCENE-10234: Add automatic module name to JAR manifests. (#440 )	2021-11-15 17:03:08 +01:00
Quentin Pradet	e034a2d6e2	LUCENE-10085: Rename DocValuesFieldExistsQuery test (#441 ) FieldValueQuery got renamed to DocValuesFieldExistsQuery but the test wasn't renamed.	2021-11-15 16:24:57 +01:00
Julie Tibshirani	607b10dc2a	LUCENE-10069: Document that kNN queries might not return all results (#434 ) Performing a kNN search with very large k may return fewer than k documents. This is due to the fact that the HNSW graph is not guaranteed to be connected. This commit documents the behavior as part of a general warning that the results of a kNN search may be approximate.	2021-11-12 14:20:09 -08:00
Julie Tibshirani	68be365283	LUCENE-10063: Fix score calculation in SimpleTextKnnVectorsFormat The method VectorSimilarityFunction#convertToScore already reverses the similarity, so we shouldn't reverse it again.	2021-11-11 11:36:50 -08:00
Julie Tibshirani	9c73562161	LUCENE-10228: Ensure PerFieldKnnVectorsFormat uses right format name (#432 ) Before when creating a KnnVectorsWriter for merging, we consulted the existing "PER_FIELD_SUFFIX_KEY" attribute to determine the format's per-field suffix. This isn't correct since we could be using a new codec (that produces different formats/ suffixes). This commit modifies TestPerFieldDocValuesFormat#testMergeUsesNewFormat to trigger the problem. Without the fix we it throws an error like "java.nio.file.FileAlreadyExistsException: File "_3_Lucene90HnswVectorsFormat_0.vem" was already written to."	2021-11-11 11:22:52 -08:00
Dawid Weiss	ff9ee28c60	LUCENE-10223: interval support in standard syntax parser (#429 )	2021-11-11 08:56:48 +01:00
Dawid Weiss	238cd5fd0c	LUCENE-10226: test target creates a weird folder (lazy property).	2021-11-09 08:38:42 +01:00

1 2 3 4 5 ...

35558 Commits All Branches Search

35558 Commits

All Branches