Commit Graph

55 Commits

Author SHA1 Message Date
Sam Saffron e440ec2519 FIX: crawler requests not tracked for non UTF-8 user agents
Non UTF-8 user_agent requests were bypassing logging due to PG always
wanting UTF-8 strings.

This adds some conversion to ensure we are always dealing with UTF-8
2019-12-09 17:43:51 +11:00
Joffrey JAFFEUX 0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
Sam Saffron 423ad5f0a4 FIX: do not log if an invalid mime type is passed to app
Previously our custom exception handler was unable to handle situations
where an invalid mime type was sent, resulting in a warning log

This ensures we pretend a request is HTML for the purpose of rendering
the error page if an invalid mime type from a scanner is shipped to the app
2019-11-21 15:51:34 +11:00
Sam Saffron 7d389df5e7 DEV: correct spec to allow for new default
b4bfc27b changes the default so the spec should be changed as well.
2019-11-18 16:05:58 +11:00
Penar Musaraj 74869b8a7f FIX: Do not consider mobile app traffic as crawler visits
Followup to a4eb523a
2019-11-04 09:16:50 -05:00
Krzysztof Kotlarek 427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
Sam Saffron ed00f35306 FEATURE: improve performance of anonymous cache
This commit introduces 2 features:

1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs

2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
2019-09-04 17:18:32 +10:00
Sam Saffron b9954b53bb FIX: report cached controller and action to loggers
Previously we would treat all cached hits in anon cache as "other"

This hinders analysis of cache performance and makes logging inaccurate
2019-09-03 10:55:16 +10:00
Sam Saffron 08743e8ac0 FEATURE: anon cache reports data to loggers
This allows custom plugins such as prometheus exporter to log how many
requests are stored in the anon cache vs used by the anon cache.

This metric allows us to fine tune cache behaviors
2019-09-02 18:45:35 +10:00
Sam Saffron 62141b6316 FEATURE: enable_performance_http_headers for performance diagnostics
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers

```text
X-Redis-Calls: 10     # number of redis calls
X-Redis-Time: 1.02    # redis time in seconds
X-Sql-Commands: 102   # number of SQL commands
X-Sql-Time: 1.02      # duration in SQL in seconds
X-Queue-Time: 1.01    # time the request sat in queue (depends on NGINX)
```

To get queue time NGINX must provide: HTTP_X_REQUEST_START

We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
2019-06-05 16:08:11 +10:00
Penar Musaraj a4eb523af6 Track Discourse user agent pageviews as crawler
Since 5bfe051e, Discourse user agents are marked as non-crawlers (to avoid accidental blacklisting). This makes sure pageviews for these agents are tracked as crawler hits.
2019-05-08 10:38:55 -04:00
Sam Saffron 4ea21fa2d0 DEV: use #frozen_string_literal: true on all spec
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.

Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
2019-04-30 10:27:42 +10:00
Neil Lalonde 526ffc4966 FIX: error in response body to blocked crawlers, showing 500 Internal Server Error with status of 403 2018-09-14 15:40:20 -04:00
Neil Lalonde b87a089822 FIX: don't block api requests when whitelisted_crawler_user_agents is set 2018-09-14 15:40:20 -04:00
Osama Sayegh 0b7ed8ffaf FEATURE: backend support for user-selectable components
* FEATURE: backend support for user-selectable components

* fix problems with previewing default theme

* rename preview_key => preview_theme_id

* omit default theme from child themes dropdown and try a different fix

* cache & freeze stylesheets arrays
2018-08-08 14:46:34 +10:00
Sam 379384ae1e FIX: never block /srv/status which is used for health checks
This route is also very cheap so blocking it is not required

It is still rate limited and so on elsewhere
2018-07-18 12:37:01 +10:00
OsamaSayegh decf1f27cf FEATURE: Groundwork for user-selectable theme components
* Phase 0 for user-selectable theme components

- Drops `key` column from the `themes` table
- Drops `theme_key` column from the `user_options` table
- Adds `theme_ids` (array of ints default []) column to the `user_options` table and migrates data from `theme_key` to the new column.
- Removes the `default_theme_key` site setting and adds `default_theme_id` instead.
- Replaces `theme_key` cookie with a new one called `theme_ids`
- no longer need Theme.settings_for_client
2018-07-12 14:18:21 +10:00
Sam e72fd7ae4e FIX: move crawler blocking into anon cache
This refinement of previous fix moves the crawler blocking into
anonymous cache

This ensures we never poison the cache incorrectly when blocking crawlers
2018-07-04 11:14:43 +10:00
Neil Lalonde e8a6323bea remove crawler blocking until multisite support 2018-07-03 17:54:45 -04:00
Sam 035312d501 FIX: specify path for dosp cookie 2018-04-24 11:07:58 -04:00
Sam ded84a4b58 PERF: improve performance once logged in rate limiter hits
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing

This means that ...

1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected

This means that site will continue to function with almost no delays while it is scaling up to handle the new load
2018-04-23 11:55:25 +10:00
Sam 59cd7894d9 FEATURE: if site is under extreme load show anon view
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.

This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.

This can happen if a URL is shared with the entire forum base and everyone
is logged on
2018-04-18 16:58:57 +10:00
Neil Lalonde b87fa6d749 FIX: blacklisted crawlers could get through by omitting the accept header 2018-04-17 12:39:30 -04:00
Sam 9980f18d86 FEATURE: track request queueing as early as possible 2018-04-17 18:06:17 +10:00
Neil Lalonde 7311023a52
Merge pull request #5700 from discourse/crawl-block
FEATURE: control web crawlers access with white/blacklist
2018-03-27 15:06:03 -04:00
Neil Lalonde 4d12ff2e8a when writing cache, remove elements from the user agents list. also return a message and content type when blocking a crawler. 2018-03-27 13:44:14 -04:00
Sam 31dea5d5fc correct flaky spec 2018-03-27 17:57:19 +11:00
Neil Lalonde a84bb81ab5 only applies to get html requests 2018-03-22 17:57:44 -04:00
Neil Lalonde ced7e9a691 FEATURE: control which web crawlers can access using a whitelist or blacklist 2018-03-22 15:41:02 -04:00
Sam f0d5f83424 FEATURE: limit assets less that non asset paths
By default assets can be requested up to 200 times per 10 seconds
from the app, this includes CSS and avatars
2018-03-06 15:20:39 +11:00
Sam Saffron df8e43abdd use lazy & instead of try
unregister ip skipper in test
raise if called when a skipper is in play
2018-02-06 10:38:15 +11:00
Robin Ward eefd226611 Add extensibility point to `request_tracker` to skip IP addresses
This is useful if you want to run a per IP rate limiter but want to be
able to skip some IPs with custom logic.
2018-02-05 17:49:40 -05:00
Sam f26ff290c3 FEATURE: Shorten setting name to max_reqs
So it is consistent with other settings
2018-01-22 13:18:30 +11:00
Sam d7657d8e47 correct specs, ensure crawler layout only applies to html 2018-01-16 16:28:11 +11:00
Sam cecd7d0d07 FEATURE: global rate limiter can bypass local IPs 2018-01-08 08:39:17 +11:00
Sam 4986ebcf24 FEATURE: optional default off global per ip rate limiter 2017-12-11 17:52:57 +11:00
Sam a4c539bade FEATURE: Allow registration of detailed request logger
Detailed request loggers can be used to gather rich timing info
from all requests (which in turn can be forwarded to monitoring solution)

Middleware::RequestTracker.detailed_request_logger(->|env, data| do
   # do stuff with env and data
end
2017-10-18 12:10:30 +11:00
Guo Xiang Tan 5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Sam ac1f84d3e1 SECURITY: theme key should be an anon cache breaker 2017-06-15 09:36:27 -04:00
Sam 39a524aac8 FEATURE: brotli cdn bypass for assets
Allow CDNS that strip out brotli encoding to use brotli regardless
2016-12-05 13:57:09 +11:00
Andy Waite 3e50313fdc Prepare for separation of RSpec helper files
Since rspec-rails 3, the default installation creates two helper files:
* `spec_helper.rb`
* `rails_helper.rb`

`spec_helper.rb` is intended as a way of running specs that do not
require Rails, whereas `rails_helper.rb` loads Rails (as Discourse's
current `spec_helper.rb` does).

For more information:

https://www.relishapp.com/rspec/rspec-rails/docs/upgrade#default-helper-files

In this commit, I've simply replaced all instances of `spec_helper` with
`rails_helper`, and renamed the original `spec_helper.rb`.

This brings the Discourse project closer to the standard usage of RSpec
in a Rails app.

At present, every spec relies on loading Rails, but there are likely
many that don't need to. In a future pull request, I hope to introduce a
separate, minimal `spec_helper.rb` which can be used in tests which
don't rely on Rails.
2015-12-01 20:39:42 +00:00
Neil Lalonde 86cd1a19cc FEATURE: page view stats for mobile view 2015-07-03 17:19:33 -04:00
Sam 771eeea837 fix spec 2015-06-16 10:53:28 +10:00
Arthur Neves b8cbe51026
Convert specs to RSpec 2.99.2 syntax with Transpec
This conversion is done by Transpec 3.1.0 with the following command:
    transpec

* 424 conversions
    from: obj.should
      to: expect(obj).to

* 325 conversions
    from: == expected
      to: eq(expected)

* 38 conversions
    from: obj.should_not
      to: expect(obj).not_to

* 15 conversions
    from: =~ /pattern/
      to: match(/pattern/)

* 9 conversions
    from: it { should ... }
      to: it { is_expected.to ... }

* 5 conversions
    from: lambda { }.should_not
      to: expect { }.not_to

* 4 conversions
    from: lambda { }.should
      to: expect { }.to

* 2 conversions
    from: -> { }.should
      to: expect { }.to

* 2 conversions
    from: -> { }.should_not
      to: expect { }.not_to

* 1 conversion
    from: === expected
      to: be === expected

* 1 conversion
    from: =~ [1, 2]
      to: match_array([1, 2])

For more details: https://github.com/yujinakayama/transpec#supported-conversions
2015-04-25 11:18:35 -04:00
Sam cbe18eb0df FEATURE: allow view exclusion using custom header
Set Discourse-Track-View to either "0" or "false" to exclude request
2015-02-26 11:41:11 +11:00
Sam acda6ebd60 FIX: view tracking needs to release data earlier
retaining data during queuing was causing huge memory spikes
2015-02-10 17:03:33 +11:00
Sam 820ce8765e refactor traffic report
split traffic report in 2, page view vs raw traffic
hide raw traffic report by default
improve flushing logic for application reqs
2015-02-06 14:39:16 +11:00
Sam 08b790b3c2 improve metrics gathered using in our traffic section
this also pulls out the middleware into its own home and inserts in front
2015-02-05 16:08:52 +11:00
Sam c150c55e2d FEATURE: rudimentary view tracking wired in 2015-02-04 16:15:16 +11:00
Luciano Sousa 0fd98b56d8 few components with rspec3 syntax 2015-01-09 13:34:37 -03:00