LUCENE-3690: Added info about changes in HTMLStripCharFilter surrogate handling to solr/CHANGES.txt.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234867 13f79535-47bb-0310-9956-ffa450edef68
2025-02-20 17:07:09 +00:00 · 2012-01-23 15:56:06 +00:00 · 2012-01-23 15:56:06 +00:00 · 9157338066
commit 9157338066
parent 7fafdd3576
1 changed files with 5 additions and 0 deletions
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@ -513,6 +513,11 @@ Bug Fixes
    from Unicode character classes [:ID_Start:] and [:ID_Continue:].
  - Uppercase character entities "&QUOT;", "&COPY;", "&GT;", "&LT;", "&REG;",
    and "&AMP;" are now recognized and handled as if they were in lowercase.
+  - The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character 
+    entities for unpaired UTF-16 low and high surrogates (in the range
+    [U+D800-U+DFFF]).
+  - Properly paired numeric character entities for UTF-16 surrogates are now
+    converted to the corresponding code units.
  - Opening tags with unbalanced quotation marks are now properly stripped.
  - Literal "<" and ">" characters in opening tags, regardless of whether they
    appear inside quotation marks, now inhibit recognition (and stripping) of