lucene/contrib/similarity
David Spencer 1d68f8c88d Logic ignored stop words were in a early version of this code but it was taken out in the belief that there
was no point in explicitly looking for them as the scoring algorithm would effictively ignore them.

I did a test and indexed 700 pages on a corporate web site and then ran the MoreLikeThis code on them
and 1/2 of the docs had stop words identified as interesting.

So - I added code in to ignore stop words, but make it backward compatible so that by default this code
is not used.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169512 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:29:56 +00:00
..
src/java/org/apache/lucene/search/similar Logic ignored stop words were in a early version of this code but it was taken out in the belief that there 2005-05-10 19:29:56 +00:00
.cvsignore move similarity and spellchecker to new contrib area 2005-02-11 18:11:05 +00:00
README.txt test checkin of README, just to verify my permissions 2005-05-09 19:25:40 +00:00
build.xml overhaul of build system to facilitate building and packaging of contrib sub-projects. some work still to be done, but core Lucene build still working fine 2005-05-02 00:11:11 +00:00

README.txt

Document similarity measures. 
This most significant contribution here is MoreLikeThis,
in /src/java/org/apache/lucene/search/similar.