FIX: properly clean Thunderbird emails, don't remove links (#16543)

Mozilla Thunderbird email client add links into HTML as follows:

`<p>The link: <a class="moz-txt-link-freetext" href="https://google.com">https://google.com</a></p>`

Current filtering rules strip out the link, leaving only `<p>The link: </p>`.
Properly strip only unnecessary information: quote prefix, signature, forwarded message header.
This commit is contained in:
ValdikSS 2022-04-25 19:57:56 +03:00 committed by GitHub
parent 922fbe82da
commit f7540aa52f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 1 deletions

View File

@ -471,7 +471,9 @@ module Email
def extract_from_mozilla(doc)
# Mozilla (Thunderbird ?) properly identifies signature and forwarded emails
# Remove them and anything that comes after
elided = doc.css("*[class^='moz-'], *[class^='moz-'] ~ *").remove
elided = doc.css("*[class^='moz-cite'], *[class^='moz-cite'] ~ *, " \
"*[class^='moz-signature'], *[class^='moz-signature'] ~ *, " \
"*[class^='moz-forward'], *[class^='moz-forward'] ~ *").remove
to_markdown(doc.to_html, elided.to_html)
end