FIX: properly clean Thunderbird emails, don't remove links (#16543)
Mozilla Thunderbird email client add links into HTML as follows: `<p>The link: <a class="moz-txt-link-freetext" href="https://google.com">https://google.com</a></p>` Current filtering rules strip out the link, leaving only `<p>The link: </p>`. Properly strip only unnecessary information: quote prefix, signature, forwarded message header.
This commit is contained in:
parent
922fbe82da
commit
f7540aa52f
|
@ -471,7 +471,9 @@ module Email
|
|||
def extract_from_mozilla(doc)
|
||||
# Mozilla (Thunderbird ?) properly identifies signature and forwarded emails
|
||||
# Remove them and anything that comes after
|
||||
elided = doc.css("*[class^='moz-'], *[class^='moz-'] ~ *").remove
|
||||
elided = doc.css("*[class^='moz-cite'], *[class^='moz-cite'] ~ *, " \
|
||||
"*[class^='moz-signature'], *[class^='moz-signature'] ~ *, " \
|
||||
"*[class^='moz-forward'], *[class^='moz-forward'] ~ *").remove
|
||||
to_markdown(doc.to_html, elided.to_html)
|
||||
end
|
||||
|
||||
|
|
Loading…
Reference in New Issue