From 6961d9f589c16bc635f46019e9bdecdbc6c0f97c Mon Sep 17 00:00:00 2001 From: "Chris M. Hostetter" Date: Tue, 31 Jul 2012 00:34:01 +0000 Subject: [PATCH] SOLR-3650: checkpoint - merged in CHANGES.txt entries from contrib/analysis-extras contrib/langid contrib/clustering git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1367377 13f79535-47bb-0310-9956-ffa450edef68 --- solr/CHANGES.txt | 88 ++++++++++++++++++++++++ solr/contrib/analysis-extras/CHANGES.txt | 63 ----------------- solr/contrib/clustering/CHANGES.txt | 87 ----------------------- solr/contrib/langid/CHANGES.txt | 36 ---------- 4 files changed, 88 insertions(+), 186 deletions(-) delete mode 100644 solr/contrib/analysis-extras/CHANGES.txt delete mode 100644 solr/contrib/clustering/CHANGES.txt delete mode 100644 solr/contrib/langid/CHANGES.txt diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt index 0520875f5a0..fde092f626d 100644 --- a/solr/CHANGES.txt +++ b/solr/CHANGES.txt @@ -554,6 +554,11 @@ New Features * SOLR-3542: Add WeightedFragListBuilder for FVH and set it to default fragListBuilder in example solrconfig.xml. (Sebastian Lutze, koji) +* SOLR-2396: Add ICUCollationField to contrib/analysis-extras, which is much + more efficient than the Solr 3.x ICUCollationKeyFilterFactory, and also + supports Locale-sensitive range queries. (rmuir) + + Optimizations ---------------------- @@ -701,6 +706,10 @@ Bug Fixes the hashCode implementation of {!bbox} and {!geofilt} queries. (hossman) +* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories + are respected now (Stanislaw Osinski, Dawid Weiss) + + Other Changes ---------------------- @@ -886,6 +895,9 @@ Bug Fixes: * SOLR-3477: SOLR does not start up when no cores are defined (Tomás Fernández Löbbe via tommaso) +* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories + are respected now (Stanislaw Osinski, Dawid Weiss) + ================== 3.6.0 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: @@ -1028,6 +1040,11 @@ New Features exception from being thrown by the default parser if "q" is missing. (yonik) SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss) +* SOLR-2919: Added parametric tailoring options to ICUCollationKeyFilterFactory. + These can be used to customize range query/sort behavior, for example to + support numeric collation, ignore punctuation/whitespace, ignore accents but + not case, control whether upper/lowercase values are sorted first, etc. (rmuir) + Optimizations ---------------------- * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter @@ -1189,6 +1206,35 @@ Bug Fixes * SOLR-3316: Distributed grouping failed when rows parameter was set to 0 and sometimes returned a wrong hit count as matches. (Cody Young, Martijn van Groningen) +* SOLR-3107: contrib/langid: When using the LangDetect implementation of + langid, set the random seed to 0, so that the same document is detected as + the same language with the same probability every time. + (Christian Moen via rmuir) + +* SOLR-2937: Configuring the number of contextual snippets used for + search results clustering. The hl.snippets parameter is now respected + by the clustering plugin, can be overridden by carrot.summarySnippets + if needed (Stanislaw Osinski). + +* SOLR-2938: Clustering on multiple fields. The carrot.title and + carrot.snippet can now take comma- or space-separated lists of + field names to cluster (Stanislaw Osinski). + +* SOLR-2939: Clustering of multilingual search results. The document's + language field be passed in the carrot.lang parameter, the carrot.lcmap + parameter enables mapping of language codes to ISO 639 (Stanislaw Osinski). + +* SOLR-2940: Passing values for custom Carrot2 fields to Clustering component. + The custom field mapping are defined using the carrot.custom parameter + (Stanislaw Osinski). + +* SOLR-2941: NullPointerException on clustering component initialization + when schema does not have a unique key field (Stanislaw Osinski). + +* SOLR-2942: ClassCastException when passing non-textual fields to + clustering component (Stanislaw Osinski). + + Other Changes ---------------------- * SOLR-2922: Upgrade commons-io and commons-lang to 2.1 and 2.6, respectively. (koji) @@ -1294,6 +1340,9 @@ New Features request param that can be used to delete all but the most recent N backups. (James Dyer via hossman) +* SOLR-2839: Add alternative implementation to contrib/langid supporting 53 + languages, based on http://code.google.com/p/language-detection/ (rmuir) + Optimizations ---------------------- @@ -1516,6 +1565,12 @@ Bug Fixes failed due to sort by function changes introduced in SOLR-1297 (Mitsu Hadeishi, hossman) +* SOLR-2706: contrib/clustering: The carrot.lexicalResourcesDir parameter + now works with absolute directories (Stanislaw Osinski) + +* SOLR-2692: contrib/clustering: Typo in param name fixed: "carrot.fragzise" + changed to "carrot.fragSize" (Stanislaw Osinski). + Other Changes ---------------------- @@ -1671,6 +1726,12 @@ New Features Explanation objects in it's responses instead of Explanation.toString (hossman) +* SOLR-2448: Search results clustering updates: bisecting k-means + clustering algorithm added, loading of Carrot2 stop words from + /conf/carrot2 (SOLR-2449), using Solr's stopwords.txt + for clustering (SOLR-2450), output of cluster scores (SOLR-2505) + (Stanislaw Osinski, Dawid Weiss). + Optimizations ---------------------- @@ -2014,6 +2075,26 @@ New Features * SOLR-1057: Add PathHierarchyTokenizerFactory. (ryan, koji) +* SOLR-1804: Re-enabled clustering component on trunk, updated to latest + version of Carrot2. No more LGPL run-time dependencies. This release of + C2 also does not have a specific Lucene dependency. + (Stanislaw Osinski, gsingers) + +* SOLR-2282: Add distributed search support for search result clustering. + (Brad Giaccio, Dawid Weiss, Stanislaw Osinski, rmuir, koji) + +* SOLR-2210: Add icu-based tokenizer and filters to contrib/analysis-extras (rmuir) + +* SOLR-1336: Add SmartChinese (word segmentation for Simplified Chinese) + tokenizer and filters to contrib/analysis-extras (rmuir) + +* SOLR-2211,LUCENE-2763: Added UAX29URLEmailTokenizerFactory, which implements + UAX#29, a unicode algorithm with good results for most languages, as well as + URL and E-mail tokenization according to the relevant RFCs. + (Tom Burton-West via rmuir) + +* SOLR-2237: Added StempelPolishStemFilterFactory to contrib/analysis-extras (rmuir) + Optimizations ---------------------- @@ -2035,6 +2116,10 @@ Optimizations * SOLR-2046: add common functions to scripts-util. (koji) +* SOLR-1684: Switch clustering component to use the + SolrIndexSearcher.doc(int, Set) method b/c it can use the document + cache (gsingers) + Bug Fixes ---------------------- * SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) @@ -2289,6 +2374,9 @@ Bug Fixes * SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not thread safe and could throw an exception. (yonik) +* SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary + option (gsingers) + Other Changes ---------------------- diff --git a/solr/contrib/analysis-extras/CHANGES.txt b/solr/contrib/analysis-extras/CHANGES.txt deleted file mode 100644 index 22ff964f1e1..00000000000 --- a/solr/contrib/analysis-extras/CHANGES.txt +++ /dev/null @@ -1,63 +0,0 @@ - Apache Solr - Analysis Extras - Release Notes - -Introduction ------------- -The analysis-extras plugin provides additional analyzers that rely -upon large dependencies/dictionaries. - -It includes integration with ICU for multilingual support, and -analyzers for Chinese and Polish. - - -$Id$ -================== 5.0.0 ============== - - (No changes) - -================== 4.0.0-ALPHA ============== - -* SOLR-2396: Add ICUCollationField, which is much more efficient than - the Solr 3.x ICUCollationKeyFilterFactory, and also supports - Locale-sensitive range queries. (rmuir) - -================== 3.6.1 ================== - -(No Changes) - -================== 3.6.0 ================== - -* SOLR-2919: Added parametric tailoring options to ICUCollationKeyFilterFactory. - These can be used to customize range query/sort behavior, for example to - support numeric collation, ignore punctuation/whitespace, ignore accents but - not case, control whether upper/lowercase values are sorted first, etc. (rmuir) - -================== 3.5.0 ================== - -(No Changes) - -================== 3.4.0 ================== - -(No Changes) - -================== 3.3.0 ================== - -(No Changes) - -================== 3.2.0 ================== - -(No Changes) - -================== 3.1.0 ================== - -* SOLR-2210: Add icu-based tokenizer and filters to contrib/analysis-extras (rmuir) - -* SOLR-1336: Add SmartChinese (word segmentation for Simplified Chinese) - tokenizer and filters to contrib/analysis-extras (rmuir) - -* SOLR-2211,LUCENE-2763: Added UAX29URLEmailTokenizerFactory, which implements - UAX#29, a unicode algorithm with good results for most languages, as well as - URL and E-mail tokenization according to the relevant RFCs. - (Tom Burton-West via rmuir) - -* SOLR-2237: Added StempelPolishStemFilterFactory to contrib/analysis-extras (rmuir) diff --git a/solr/contrib/clustering/CHANGES.txt b/solr/contrib/clustering/CHANGES.txt deleted file mode 100644 index 5469984c7e8..00000000000 --- a/solr/contrib/clustering/CHANGES.txt +++ /dev/null @@ -1,87 +0,0 @@ -Apache Solr Clustering Implementation - -Intro: - -See http://wiki.apache.org/solr/ClusteringComponent - -CHANGES - -$Id$ -================== Release 5.0.0 ============== - - (No changes) - -================== Release 4.0.0-ALPHA ============== - -* SOLR-3470: Bug fix: custom Carrot2 tokenizer and stemmer factories are - respected now (Stanislaw Osinski, Dawid Weiss) - -================== Release 3.6.1 ================== - -* SOLR-3470: Bug fix: custom Carrot2 tokenizer and stemmer factories are - respected now (Stanislaw Osinski, Dawid Weiss) - -================== Release 3.6.0 ================== - -* SOLR-2937: Configuring the number of contextual snippets used for - search results clustering. The hl.snippets parameter is now respected - by the clustering plugin, can be overridden by carrot.summarySnippets - if needed (Stanislaw Osinski). - -* SOLR-2938: Clustering on multiple fields. The carrot.title and - carrot.snippet can now take comma- or space-separated lists of - field names to cluster (Stanislaw Osinski). - -* SOLR-2939: Clustering of multilingual search results. The document's - language field be passed in the carrot.lang parameter, the carrot.lcmap - parameter enables mapping of language codes to ISO 639 (Stanislaw Osinski). - -* SOLR-2940: Passing values for custom Carrot2 fields. The custom field - mapping are defined using the carrot.custom parameter (Stanislaw Osinski). - -* SOLR-2941: NullPointerException on clustering component initialization - when schema does not have a unique key field (Stanislaw Osinski). - -* SOLR-2942: ClassCastException when passing non-textual fields for - clustering (Stanislaw Osinski). - -================== Release 3.5.0 ================== - -(No Changes) - -================== Release 3.4.0 ================== - -* SOLR-2706: The carrot.lexicalResourcesDir parameter now works - with absolute directories (Stanislaw Osinski) - -* SOLR-2692: Typo in param name fixed: "carrot.fragzise" changed to - "carrot.fragSize" (Stanislaw Osinski). - -================== Release 3.3.0 ================== - -(No Changes) - -================== Release 3.2.0 ================== - -* SOLR-2448: Search results clustering updates: bisecting k-means - clustering algorithm added, loading of Carrot2 stop words from - /conf/carrot2 (SOLR-2449), using Solr's stopwords.txt - for clustering (SOLR-2450), output of cluster scores (SOLR-2505) - (Stanislaw Osinski, Dawid Weiss). - -================== Release 3.1.0 ================== - -* SOLR-1684: Switch to use the SolrIndexSearcher.doc(int, Set) method b/c it can use the document cache (gsingers) - -* SOLR-1692: Fix bug relating to carrot.produceSummary option (gsingers) - -* SOLR-1804: Re-enabled clustering on trunk, updated to latest version of Carrot2. No more LGPL run-time dependencies. - This release of C2 also does not have a specific Lucene dependency. (Stanislaw Osinski, gsingers) - -* SOLR-2282: Add distributed search support for search result clustering. - (Brad Giaccio, Dawid Weiss, Stanislaw Osinski, rmuir, koji) - -================== Release 1.4.0 ================== - -Solr Clustering will be released for the first time in Solr 1.4. See http://wiki.apache.org/solr/ClusteringComponent - for details on using. diff --git a/solr/contrib/langid/CHANGES.txt b/solr/contrib/langid/CHANGES.txt deleted file mode 100644 index f6a7d19f75d..00000000000 --- a/solr/contrib/langid/CHANGES.txt +++ /dev/null @@ -1,36 +0,0 @@ -Apache Solr Language Identifier - Release Notes - -This file describes changes to the SolrTika Language Identifier (contrib/langid) module. -See http://wiki.apache.org/solr/LanguageDetection for details - - -$Id$ - -================== Release 5.0.0 ================== - -(No changes) - -================== Release 4.0.0-ALPHA ================== - -(No changes) - -================== Release 3.6.1 ================== - -(No Changes) - -================== Release 3.6.0 ================== - -* SOLR-3107: When using the LangDetect implementation of langid, set the random - seed to 0, so that the same document is detected as the same language with - the same probability every time. (Christian Moen via rmuir) - -================== Release 3.5.0 ================== - -Initial release. See README.txt. - -* SOLR-1979: New contrib "langid". Adds language identification capabilities as an - Update Processor, using Tika's LanguageIdentifier (janhoy, Tommaso Teofili, gsingers) - -* SOLR-2839: Add alternative implementation supporting 53 languages, - based on http://code.google.com/p/language-detection/ (rmuir)