From 915733806692ddf2e47f1bfbc842025bc4b4c3a0 Mon Sep 17 00:00:00 2001 From: Steven Rowe Date: Mon, 23 Jan 2012 15:56:06 +0000 Subject: [PATCH] LUCENE-3690: Added info about changes in HTMLStripCharFilter surrogate handling to solr/CHANGES.txt. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234867 13f79535-47bb-0310-9956-ffa450edef68 --- solr/CHANGES.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt index 8a33b2e8728..aa4112a57a3 100644 --- a/solr/CHANGES.txt +++ b/solr/CHANGES.txt @@ -513,6 +513,11 @@ Bug Fixes from Unicode character classes [:ID_Start:] and [:ID_Continue:]. - Uppercase character entities """, "©", ">", "<", "®", and "&" are now recognized and handled as if they were in lowercase. + - The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character + entities for unpaired UTF-16 low and high surrogates (in the range + [U+D800-U+DFFF]). + - Properly paired numeric character entities for UTF-16 surrogates are now + converted to the corresponding code units. - Opening tags with unbalanced quotation marks are now properly stripped. - Literal "<" and ">" characters in opening tags, regardless of whether they appear inside quotation marks, now inhibit recognition (and stripping) of