Previously, we blocked search engines in tag pages since they may get marked as a duplicate content.
* DEV: block tag inner pages from search engines crawling.
Googlebot handles no-index headers very elegantly. It advises to leave as many routes as possible open and uses headers for high fidelity rules regarding indexes.
Discourse adds special `x-robot-tags` noindex headers to users, badges, groups, search and tag routes.
Following up on b52143feff we now have it so Googlebot gets special handling.
Rest of the crawlers get a far more aggressive disallow list to protect against excessive crawling.
DEV: Replace instances of Discourse.base_uri with Discourse.base_path
This is clearer because the base_uri is actually just a path prefix. This continues the work started in 555f467.
* FEATURE: Allow customization of robots.txt
This allows admins to customize/override the content of the robots.txt
file at /admin/customize/robots. That page is not linked to anywhere in
the UI -- admins have to manually type the URL to access that page.
* use Ember.computed.not
* Jeff feedback
* Feedback
* Remove unused import
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
Also moved to default crawl delay bing so no more than a req every 5 seconds is allowed
New site settings:
"slow_down_crawler_user_agents" - list of crawlers that will be slowed down
"slow_down_crawler_rate" - how many seconds to wait between requests
Not enforced server side yet