LUCENE-9354: Sync French stop words with latest version from Snowball. (#1474)

* Sync French stop words with latest version from Snowball.

This new version removed some French homonyms from the list

* Use latest master commit from snowball-website

* LUCENE-9354: regenerate with 'gradle snowball

* LUCENE-9354: add CHANGES.txt entry
This commit is contained in:
Philippe Ouellet 2020-05-01 21:11:35 -04:00 committed by GitHub
parent 242f48a1ca
commit 7a849f6943
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 14 additions and 11 deletions

View File

@ -31,7 +31,7 @@ configure(project(":lucene:analysis:common")) {
// git commit hash of source code https://github.com/snowballstem/snowball/
snowballStemmerCommit = "53739a805cfa6c77ff8496dc711dc1c106d987c1"
// git commit hash of stopwords https://github.com/snowballstem/snowball-website
snowballWebsiteCommit = "ff891e74f08e7315523ee3c0cad55bb1b7831b9d"
snowballWebsiteCommit = "5a8cf2451d108217585d8e32d744f8b8fd20c711"
// git commit hash of test data https://github.com/snowballstem/snowball-data
snowballDataCommit = "9145f8732ec952c8a3d1066be251da198a8bc792"

View File

@ -93,6 +93,9 @@ Improvements
Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball'
task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss)
* LUCENE-9354: Improvements to snowball french stopwords list, so that it is less
aggressive. (Philippe Ouellet)
* LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley)
* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma)

View File

@ -51,7 +51,7 @@ qui | who
sa | his, her (fem)
se | oneself
ses | his (pl)
son | his, her (masc)
| son | his, her (masc). Omitted because it is homonym of "sound"
sur | on
ta | thy (fem)
te | thee
@ -79,15 +79,15 @@ t | t'
y | there
| forms of être (not including the infinitive):
été
| été - Omitted because it is homonym of "summer"
étée
étées
étés
| étés - Omitted because it is homonym of "summers"
étant
suis
es
est
sommes
| est - Omitted because it is homonym of "east"
| sommes - Omitted because it is homonym of "sums"
êtes
sont
serai
@ -118,7 +118,7 @@ soyez
soient
fusse
fusses
fût
| fût - Omitted because it is homonym of "tap", like in "beer on tap"
fussions
fussiez
fussent
@ -130,13 +130,13 @@ eue
eues
eus
ai
as
| as - Omitted because it is homonym of "ace"
avons
avez
ont
aurai
auras
aura
| auras - Omitted because it is also the name of a kind of wind
| aura - Omitted because it is also the name of a kind of wind and homonym of "aura"
aurons
aurez
auront
@ -147,7 +147,7 @@ auriez
auraient
avais
avait
avions
| avions - Omitted because it is homonym of "planes"
aviez
avaient
eut