PERF: avoid following links in topic RSS feeds (#16145)

Topic RSS feeds contain many non canonical links such as:

- https://site.com/t/a-b-c/111/1
- https://site.com/t/a-b-c/111/2
- https://site.com/t/a-b-c/111/3
- https://site.com/t/a-b-c/111/4
- https://site.com/t/a-b-c/111/5
- https://site.com/t/a-b-c/111/6

Previously we were not indexing RSS feeds yet still following these
links.


This change means we totally ignore links in the RSS feeds which
avoids expensive work scanning them just to find we should not
include them.
This commit is contained in:
Sam 2022-03-09 18:25:20 +11:00 committed by GitHub
parent 28bb9e11f4
commit 43da88db6c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 8 additions and 1 deletions

View File

@ -897,7 +897,7 @@ class ApplicationController < ActionController::Base
end
def add_noindex_header
if request.get?
if request.get? && !response.headers['X-Robots-Tag']
if SiteSetting.allow_index_in_robots_txt
response.headers['X-Robots-Tag'] = 'noindex'
else

View File

@ -922,6 +922,8 @@ class TopicsController < ApplicationController
end
discourse_expires_in 1.minute
response.headers['X-Robots-Tag'] = 'noindex, nofollow'
render 'topics/show', formats: [:rss]
end

View File

@ -2850,6 +2850,11 @@ RSpec.describe TopicsController do
get "/t/foo/#{topic.id}.rss"
expect(response.status).to eq(200)
expect(response.media_type).to eq('application/rss+xml')
# our RSS feed is full of post 1/2/3/4/5 links, we do not want it included
# in the index, and do not want links followed
# this allows us to remove it while allowing via robots.txt
expect(response.headers['X-Robots-Tag']).to eq('noindex, nofollow')
end
it 'renders rss of the topic correctly with subfolder' do