mirror of https://github.com/apache/lucene.git
LUCENE-3690: Added info about changes in HTMLStripCharFilter surrogate handling to solr/CHANGES.txt.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234867 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7fafdd3576
commit
9157338066
|
@ -513,6 +513,11 @@ Bug Fixes
|
|||
from Unicode character classes [:ID_Start:] and [:ID_Continue:].
|
||||
- Uppercase character entities """, "©", ">", "<", "®",
|
||||
and "&" are now recognized and handled as if they were in lowercase.
|
||||
- The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character
|
||||
entities for unpaired UTF-16 low and high surrogates (in the range
|
||||
[U+D800-U+DFFF]).
|
||||
- Properly paired numeric character entities for UTF-16 surrogates are now
|
||||
converted to the corresponding code units.
|
||||
- Opening tags with unbalanced quotation marks are now properly stripped.
|
||||
- Literal "<" and ">" characters in opening tags, regardless of whether they
|
||||
appear inside quotation marks, now inhibit recognition (and stripping) of
|
||||
|
|
Loading…
Reference in New Issue