LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474)

* Sync French stop words with latest version from Snowball.

This new version removed some French homonyms from the list

* Use latest master commit from snowball-website

* LUCENE-9354: regenerate with 'gradle snowball

* LUCENE-9354: add CHANGES.txt entry
This commit is contained in:
Philippe Ouellet 2020-05-01 21:11:35 -04:00 committed by GitHub
parent 242f48a1ca
commit 7a849f6943
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 14 additions and 11 deletions

View File

@ -31,7 +31,7 @@ configure(project(":lucene:analysis:common")) {
// git commit hash of source code https://github.com/snowballstem/snowball/ // git commit hash of source code https://github.com/snowballstem/snowball/
snowballStemmerCommit = "53739a805cfa6c77ff8496dc711dc1c106d987c1" snowballStemmerCommit = "53739a805cfa6c77ff8496dc711dc1c106d987c1"
// git commit hash of stopwords https://github.com/snowballstem/snowball-website // git commit hash of stopwords https://github.com/snowballstem/snowball-website
snowballWebsiteCommit = "ff891e74f08e7315523ee3c0cad55bb1b7831b9d" snowballWebsiteCommit = "5a8cf2451d108217585d8e32d744f8b8fd20c711"
// git commit hash of test data https://github.com/snowballstem/snowball-data // git commit hash of test data https://github.com/snowballstem/snowball-data
snowballDataCommit = "9145f8732ec952c8a3d1066be251da198a8bc792" snowballDataCommit = "9145f8732ec952c8a3d1066be251da198a8bc792"

View File

@ -93,6 +93,9 @@ Improvements
Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball' Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball'
task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss) task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss)
* LUCENE-9354: Improvements to snowball french stopwords list, so that it is less
aggressive. (Philippe Ouellet)
* LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley) * LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley)
* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma) * LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma)

View File

@ -51,7 +51,7 @@ qui | who
sa | his, her (fem) sa | his, her (fem)
se | oneself se | oneself
ses | his (pl) ses | his (pl)
son | his, her (masc) | son | his, her (masc). Omitted because it is homonym of "sound"
sur | on sur | on
ta | thy (fem) ta | thy (fem)
te | thee te | thee
@ -79,15 +79,15 @@ t | t'
y | there y | there
| forms of être (not including the infinitive): | forms of être (not including the infinitive):
été | été - Omitted because it is homonym of "summer"
étée étée
étées étées
étés | étés - Omitted because it is homonym of "summers"
étant étant
suis suis
es es
est | est - Omitted because it is homonym of "east"
sommes | sommes - Omitted because it is homonym of "sums"
êtes êtes
sont sont
serai serai
@ -118,7 +118,7 @@ soyez
soient soient
fusse fusse
fusses fusses
fût | fût - Omitted because it is homonym of "tap", like in "beer on tap"
fussions fussions
fussiez fussiez
fussent fussent
@ -130,13 +130,13 @@ eue
eues eues
eus eus
ai ai
as | as - Omitted because it is homonym of "ace"
avons avons
avez avez
ont ont
aurai aurai
auras | auras - Omitted because it is also the name of a kind of wind
aura | aura - Omitted because it is also the name of a kind of wind and homonym of "aura"
aurons aurons
aurez aurez
auront auront
@ -147,7 +147,7 @@ auriez
auraient auraient
avais avais
avait avait
avions | avions - Omitted because it is homonym of "planes"
aviez aviez
avaient avaient
eut eut