FIX: Unescapes hash section with present to account for url-encoded chars

Sections with unreserverd characters will appear url-encoded and need to
be unescaped before using it.

Wikipedia generates 2 different spans in this case in the same page, one
with an id resulting of replacing the % symbols with . and the other with
the decoded version of the string. For example, for /wiki/foo#A%C3%A1A it
will generate:

<span id="A.C3.A1A"></span>
<span id="AáA">AáA</span>

Unescaping the `m_url_hash_name` should work in all cases to target the
proper section span.
This commit is contained in:
Chema Balsas 2021-08-11 21:56:55 +01:00 committed by Robin Ward
parent 745b99edbf
commit d27d7c8cca
1 changed files with 1 additions and 1 deletions

View File

@ -24,7 +24,7 @@ module Onebox
end
unless m_url_hash.nil?
section_header_title = raw.xpath("//span[@id='#{m_url_hash_name}']")
section_header_title = raw.xpath("//span[@id='#{CGI.unescape(m_url_hash_name)}']")
if section_header_title.empty?
paras = raw.search("p") # default get all the paras