LUCENE-3690: Added info about changes in HTMLStripCharFilter surrogate handling to solr/CHANGES.txt.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234867 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Steven Rowe 2012-01-23 15:56:06 +00:00
parent 7fafdd3576
commit 9157338066
1 changed files with 5 additions and 0 deletions

View File

@ -513,6 +513,11 @@ Bug Fixes
from Unicode character classes [:ID_Start:] and [:ID_Continue:].
- Uppercase character entities """, "©", ">", "<", "®",
and "&" are now recognized and handled as if they were in lowercase.
- The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character
entities for unpaired UTF-16 low and high surrogates (in the range
[U+D800-U+DFFF]).
- Properly paired numeric character entities for UTF-16 surrogates are now
converted to the corresponding code units.
- Opening tags with unbalanced quotation marks are now properly stripped.
- Literal "<" and ">" characters in opening tags, regardless of whether they
appear inside quotation marks, now inhibit recognition (and stripping) of