FIX: Unescapes hash section with present to account for url-encoded chars

Sections with unreserverd characters will appear url-encoded and need to be unescaped before using it. Wikipedia generates 2 different spans in this case in the same page, one with an id resulting of replacing the % symbols with . and the other with the decoded version of the string. For example, for /wiki/foo#A%C3%A1A it will generate: <span id="A.C3.A1A"></span> <span id="AáA">AáA</span> Unescaping the `m_url_hash_name` should work in all cases to target the proper section span.
2021-08-11 21:56:55 +01:00 · 2021-08-11 21:56:55 +01:00 · d27d7c8cca
parent 745b99edbf
commit d27d7c8cca
1 changed files with 1 additions and 1 deletions
--- a/lib/onebox/engine/wikipedia_onebox.rb
+++ b/lib/onebox/engine/wikipedia_onebox.rb
@ -24,7 +24,7 @@ module Onebox
        end

        unless m_url_hash.nil?
-          section_header_title = raw.xpath("//span[@id='#{m_url_hash_name}']")
+          section_header_title = raw.xpath("//span[@id='#{CGI.unescape(m_url_hash_name)}']")

          if section_header_title.empty?
            paras = raw.search("p") # default get all the paras