FIX: Unescapes hash section with present to account for url-encoded chars
Sections with unreserverd characters will appear url-encoded and need to be unescaped before using it. Wikipedia generates 2 different spans in this case in the same page, one with an id resulting of replacing the % symbols with . and the other with the decoded version of the string. For example, for /wiki/foo#A%C3%A1A it will generate: <span id="A.C3.A1A"></span> <span id="AáA">AáA</span> Unescaping the `m_url_hash_name` should work in all cases to target the proper section span.
This commit is contained in:
parent
745b99edbf
commit
d27d7c8cca
|
@ -24,7 +24,7 @@ module Onebox
|
|||
end
|
||||
|
||||
unless m_url_hash.nil?
|
||||
section_header_title = raw.xpath("//span[@id='#{m_url_hash_name}']")
|
||||
section_header_title = raw.xpath("//span[@id='#{CGI.unescape(m_url_hash_name)}']")
|
||||
|
||||
if section_header_title.empty?
|
||||
paras = raw.search("p") # default get all the paras
|
||||
|
|
Loading…
Reference in New Issue