From 915733806692ddf2e47f1bfbc842025bc4b4c3a0 Mon Sep 17 00:00:00 2001
From: Steven Rowe <sarowe@apache.org>
Date: Mon, 23 Jan 2012 15:56:06 +0000
Subject: [PATCH] LUCENE-3690: Added info about changes in HTMLStripCharFilter
 surrogate handling to solr/CHANGES.txt.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234867 13f79535-47bb-0310-9956-ffa450edef68
---
 solr/CHANGES.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 8a33b2e8728..aa4112a57a3 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -513,6 +513,11 @@ Bug Fixes
     from Unicode character classes [:ID_Start:] and [:ID_Continue:].
   - Uppercase character entities "&QUOT;", "&COPY;", "&GT;", "&LT;", "&REG;",
     and "&AMP;" are now recognized and handled as if they were in lowercase.
+  - The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character 
+    entities for unpaired UTF-16 low and high surrogates (in the range
+    [U+D800-U+DFFF]).
+  - Properly paired numeric character entities for UTF-16 surrogates are now
+    converted to the corresponding code units.
   - Opening tags with unbalanced quotation marks are now properly stripped.
   - Literal "<" and ">" characters in opening tags, regardless of whether they
     appear inside quotation marks, now inhibit recognition (and stripping) of