FIX: Replace all quote-like unicodes with quotes (#19714)

If unaccent is called with quote-like Unicode characters then it can
generate invalid queries because some of the transformed quotes by
unaccent are not escaped and to_tsquery fails because of bad input.

This commits replaces more quote-like Unicode characters before
unaccent is called.
This commit is contained in:
Bianca Nenciu 2023-01-09 19:19:51 +02:00 committed by GitHub
parent ba3a6da5db
commit fb780c50fd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 11 additions and 3 deletions

View File

@ -1230,9 +1230,11 @@ class Search
end
def self.escape_string(term)
# HACK: The has to be "unaccented" before it is escaped or the resulting
# tsqueries will be invalid
term = term.gsub("\u{2019}", "'") if SiteSetting.search_ignore_accents
# HACK: The and other similar characters have to be "unaccented" before
# it is escaped or the resulting tsqueries will be invalid
if SiteSetting.search_ignore_accents
term = term.gsub(/[\u02b9\u02bb\u02bc\u02bd\u02c8\u2018\u2019\u201b\u2032\uff07]/, "'")
end
PG::Connection.escape_string(term).gsub('\\', '\\\\\\')
end

View File

@ -645,6 +645,12 @@ RSpec.describe Topic do
expect(Topic.similar_to("怎么上自己的", "")).to eq([])
end
it "does not result in invalid statement when title contains unicode characters" do
SiteSetting.search_ignore_accents = true
expect(Topic.similar_to("'bad quotes'", "'bad quotes'")).to eq([])
end
context "with a similar topic" do
fab!(:post) do
SearchIndexer.enable