There have been too many flaky tests as a result of leaking state in
Redis so it is easier to resolve them by ensuring we flush Redis'
database.
Locally on my machine, calling `Discourse.redis.flushdb` takes around
0.1ms which means this change will have very little impact on test
runtimes.
This patch removes the `with_service` helper from the code base.
Instead, we can pass a block with actions directly to the `.call` method
of a service.
This simplifies how to use services:
- use `.call` without a block to run the service and get its result
object.
- use `.call` with a block of actions to run the service and execute
arbitrary code depending on the service outcome.
It also means a service is now “self-contained” and can be used anywhere
without having to include a helper or whatever.
This has been split out from https://github.com/discourse/discourse/pull/28051
so we can use this same code in plugin specs before merging the core PR,
adds some helpers for creating local backup temp files
and cleaning them up.
Sometimes the backtrace is quite big for failing specs, this env var
(RSPEC_EXCLUDE_NOISE_IN_BACKTRACE) can be set to
1 to remove backtrace from anything but spec or application code in
rspec. This makes it easier to see where the actual failure is
coming from, most of the time all the gem paths are noise.
* DEV: Upgrade Rails to 7.1
* FIX: Remove references to `Rails.logger.chained`
`Rails.logger.chained` was provided by Logster before Rails 7.1
introduced their broadcast logger. Now all the loggers are added to
`Rails.logger.broadcasts`.
Some code in our initializers was still using `chained` instead of
`broadcasts`.
* DEV: Make parameters optional to all FakeLogger methods
* FIX: Set `override_level` on Logster loggers (#27519)
A followup to f595d599dd
* FIX: Don’t duplicate Rack response
---------
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
* DEV: Upgrade Rails to 7.1
* FIX: Remove references to `Rails.logger.chained`
`Rails.logger.chained` was provided by Logster before Rails 7.1
introduced their broadcast logger. Now all the loggers are added to
`Rails.logger.broadcasts`.
Some code in our initializers was still using `chained` instead of
`broadcasts`.
* DEV: Make parameters optional to all FakeLogger methods
* FIX: Set `override_level` on Logster loggers (#27519)
A followup to f595d599dd
* FIX: Don’t duplicate Rack response
---------
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
* Revert "FIX: Set `override_level` on Logster loggers (#27519)"
This reverts commit c1b0488c54.
* Revert "DEV: Make parameters optional to all FakeLogger methods"
This reverts commit 3318dad7b4.
* Revert "FIX: Remove references to `Rails.logger.chained`"
This reverts commit f595d599dd.
* Revert "DEV: Upgrade Rails to 7.1"
This reverts commit 081b00391e.
This commit tries another work around for the `Socket::ResolutionError: getaddrinfo: Temporary failure in name resolution`
error we are seeing on CI.
The problem with the previous workaround is that `Capybara.using_session` will attempt to resolve `localhost`
before yielding the block which means our retry code is not hit.
This problem may be related to https://bugs.ruby-lang.org/issues/20172
which hints at us potentially not being able to spin up threads on CI so
I'm adding a debugging statement when stuff fails.
This is a follow up to 9ff0805a1d. We
noticed that `localhost` can fail to resolve in other spots of the app
and not just in selenium-webdriver.
From the failing tests we have seen, the `getaddrinfo: Temporary failure in name resolution` error is only
seen from within the `Capybara.using_session` block. This commit aims to
ensure that `localhost` can be resolve after the new session is started.
On Github Actions, system tests which uses `Capybara#using_session` are
failing intermittently with the error "Socket::ResolutionError: getaddrinfo: Temporary failure in name resolution"
when `Selenium::WebDriver::Platform.localhost` tries to resolve
`localhost`.
Too much time has been spent trying to figure out why so we are giving
up here and just retrying the resolution of `localhost` on Github
Actions.
We are seeing a weird resolution error on Github actions with the
following backtrace:
```
Failure/Error:
visit File.join(
GlobalSetting.relative_url_root || "",
"/session/#{user.encoded_username}/become.json?redirect=false",
)
Socket::ResolutionError:
getaddrinfo: Temporary failure in name resolution
```
Switch to use `127.0.0.1` instead of forcing a name resolution.
- Run the CSP-nonce-related middlewares on the generated response
- Fix the readonly mode checking to avoid empty strings being passed (the `check_readonly_mode` before_action will not execute in the case of these re-dispatched exceptions)
- Move the BlockRequestsMiddleware cookie-setting to the middleware, so that it is included even for unusual HTML responses like these exceptions
This is only required in rails_helper, otherwise it is
not loaded. Allows for better debugging by allowing
navigation of the call stack from the point of `binding.pry`
c.f. https://github.com/pry/pry-stack_explorer
This is to enable :array type attributes for Contract
attributes in services, this is a followup to the move
of services from chat to core here:
cab178a405
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
Followup 0bbca318f2,
rather than making developers provide the plugin path
name (which may not always be the same depending on
dir names and git cloning etc) we can infer the plugin
dir from the caller in plugin_file_from_fixtures
This allows plugins to also easily read fixture
files for tests, rather than having to do stuff
like this:
```
File.open(File.join(__dir__, "../../../fixtures/100x100.jpg"))
```
Why this change?
Google does not yet publish binaries for chrome and chromedriver for
`linux/arm64`. In 484954ec4c, we attempted
to add support for running system tests on `linux/arm64` by switching to
Firefox but our system tests seem to make lots of assumptions about
running on chromium based browsers so there are some tests that don't work in Firefox.
This commit works around the lack of chrome and chromedriver binaries by
doing the following:
1. Adds a `DISCOURSE_SYSTEM_TEST_CHROMIUM` ENV variable which when set to
`1` will allow us to run system tests using a chromium binary. Chromium
binaries for `linux/arm64` are available and since Chrome is Chromium based, all of our
system tests "should pass" even when running against a Chromium binary. I don't expect
this to be perfect but I expect it to be better than running against Firefox. This change buys us time
until Chrome finally ships binaries for `linux/arm64`.
2. Adds a `DISCOURSE_SYSTEM_TEST_CHROMEDRIVER_PATH` ENV variable to
allow the chromedriver path to be configured. We need this because
the [electron project](https://github.com/electron/electron/releases) actually
releases chromewebdriver for `linux/arm64` so someone running
`linux/arm64` can download the necessary chromedriver from the
project instead of relying on selenium-manager.
This change is also important for us to support [discourse_test](https://github.com/discourse/discourse_docker/blob/main/image/discourse_test/Dockerfile) and [discourse_dev](https://github.com/discourse/discourse_docker/blob/main/image/discourse_dev/Dockerfile) images targeted at `linux/arm64`.
In rspec request specs, we do a huge verbose backtrace
when there is an error. However, 99% of the time you don't
care about pages and pages of activesupport/rspec gem
LOC in the backtrace...so this commit introduces an
env var RSPEC_EXCLUDE_GEMS_IN_BACKTRACE to allow for
turning this off.
Why this change?
This reverts 725561cf4b as it did not
address the root cause of the problem even though it fixed the failing tests we were seeing
when running `bundle exec rspec --tag ~type:multisite --order random:776 spec/system/admin_customize_form_templates_spec.rb spec/system/admin_sidebar_navigation_spec.rb spec/system/admin_site_setting_search_spec.rb spec/system/composer/dont_feed_the_trolls_popup_spec.rb spec/system/composer/review_media_unless_trust_level_spec.rb spec/system/create_account_spec.rb spec/system/editing_sidebar_tags_navigation_spec.rb spec/system/email_change_spec.rb spec/system/emojis/emoji_deny_list_spec.rb spec/system/group_activity_spec.rb spec/system/hashtag_autocomplete_spec.rb spec/system/network_disconnected_spec.rb spec/system/post_menu_spec.rb spec/system/post_small_action_spec.rb spec/system/tags_intersection_spec.rb spec/system/topic_list_focus_spec.rb spec/system/topic_page_spec.rb spec/system/user_page/user_profile_info_panel_spec.rb spec/system/viewing_group_members_spec.rb spec/system/viewing_navigation_menu_preferences_spec.rb`.
The root cause here is that `before_action`s added to a controller is
order dependent. As such, some requests were not setting the cookie
because the `before_action` callback was not even hit as a prior
`before_action` callbacks has raised an error such as the `check_xhr`
`before_action` callback.
To resolve the problem, we need to add the `prepend: true` option in
our monkey patch of `ApplicationController` to ensure that the
`before_action` callback which we have added is always run first.
This change also makes a couple of changes:
1. Improve the response body when a request is blocked by the `BlockRequestsMiddleware` middleware
so that it makes debugging easier.
2. Only set the cookies for non-xhr HTML format requests. Setting it for
other formats is kind of pointless.
Why this change?
We noticed that running `LOAD_PLUGINS=1 rspec --seed=38855 plugins/chat/spec/system/chat_new_message_spec.rb` locally
results in the system tests randomly failing. When we inspected the
request logs closely, we noticed that a `/presence/get` request from a
previous rspec example was being processed when a new rspec example is
already being run. We know it was from the previous rspec example
because inspecting the auth token showed the request using the auth
token of a user from the previous example. However, when a request using
an auth token from a previous example is used it ends up logging out the
same user on the server side because the user id in the cookie is the same
due to the use of `fab!`.
I did some research and there is apparently no way to wait until all
inflight requests by the browser has completed through capybara or
selenium. Therefore, we will add an identifier by attaching a cookie to all non-xhr requests so that
xhr requests which are triggered subsequently will contain the cookie in the request.
In the `BlockRequestsMiddleware` middleware, we will then reject any
requests when the value of the identifier in the cookie does not match the current rspec's example
location.
To see the problem locally, change `Auth::DefaultCurrentUserProvider.find_v1_auth_cookie` to the following:
```
def self.find_v1_auth_cookie(env)
return env[DECRYPTED_AUTH_COOKIE] if env.key?(DECRYPTED_AUTH_COOKIE)
env[DECRYPTED_AUTH_COOKIE] = begin
request = ActionDispatch::Request.new(env)
cookie = request.cookies[TOKEN_COOKIE]
# don't even initialize a cookie jar if we don't have a cookie at all
if cookie&.valid_encoding? && cookie.present?
puts "#{env["REQUEST_PATH"]} #{request.cookie_jar.encrypted[TOKEN_COOKIE]&.with_indifferent_access}"
request.cookie_jar.encrypted[TOKEN_COOKIE]&.with_indifferent_access
end
end
end
```
After which run the following command: `LOAD_PLUGINS=1 rspec --format documentation --seed=38855 plugins/chat/spec/system/chat_new_message_spec.rb`
It takes a few tries but the last spec should fail and you should see something like this:
```
assets/chunk.c16f6ba8b6824baa47ac.d41d8cd9.js {"token"=>"37d995a4b65395d3b343ec70fff915b4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591735}
/assets/chunk.050148142e1d2dc992dd.d41d8cd9.js {"token"=>"37d995a4b65395d3b343ec70fff915b4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591735}
/chat/api/channels/527/messages {"token"=>"37d995a4b65395d3b343ec70fff915b4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591735}
/uploads/default/test_0/optimized/1X/_129430568242d1b7f853bb13ebea28b3f6af4e7_2_512x512.png {"token"=>"37d995a4b65395d3b343ec70fff915b4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591735}
redirects to existing chat channel
redirects to chat channel if recipients param is missing (PENDING: Temporarily skipped with xit)
with multiple users
/favicon.ico {"token"=>"9a75c114c4d3401509a23d240f0a46d4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591736}
/chat/new-message {"token"=>"9a75c114c4d3401509a23d240f0a46d4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591736}
/presence/get {"token"=>"37d995a4b65395d3b343ec70fff915b4", "user_id"=>3382, "username"=>"bruce0", "trust_level"=>1, "issued_at"=>1708591735}
```
Note how the `/presence/get` request is using a token from the previous example.
Co-authored-by: David Taylor <david@taylorhq.com>
Why this change?
This is a follow up to c30aeafd9d. The
commit was calling `BlockRequestsMiddleware.allow_requests!` only before
`type: :system` tests but non system type tests could be running as well
and needs the `BlockRequestsMiddleware.allow_requests!` middleware to be
disabled too.
Why this change?
We have been debugging flaky system tests and noticed in https://github.com/discourse/discourse/actions/runs/7911902047/job/21596791343?pr=25690
that ActiveRecord connection checkout timeouts are encountered because
the Capybara server thread is processing requests even after
`Capybara.reset_session!` and ActiveRecord's `teardown_fixtures` have already been call.
The theory here is that an inflight request can still hit the Capybara
server even after `Capybara.reset_session!` has been called and end up
eating up an ActiveRecord connection for too long and also messing with
the database outside of a transaction.
What does this change do?
This change adds a `BlockRequestsMiddleware` middleware in the test
environment which is enabled to reject all incoming requests at the end
of each system test and before `Capybara.reset_session!` is called. At
the start of each RSpec test, the middleware is disabled again.
Why this change?
On CI, we have been seeing flaky system tests because ActiveRecord is
unable to checkout a connection. This patch is meant to help us debug
which thread is not returning the connection to the queue.
Why this change?
We are getting the following error on CI:
`Text file busy -
/github/home/.cache/selenium/chromedriver/linux64/121.0.6167.85/chromedriver`
This happens when two processes tries to download the chromedriver at
the same time. I'm not entirely sure why the previous implementatio is
not locking properly since we still saw the `Text file busy` error but
by accquring the lock before we even check for the existence of
`~/.cache/selenium`, we should be able to eliminate the chance of two
processes trying to download the chromedriver binary at the same time.
Why this change?
In workflow runs, we have seen processes being stuck on a flock lock and
I'm guessing because we are using `"w"` when opening the file which the
ruby documentation advises against as it states "don't use "w" because it truncates the file before lock."
Stuck workflow run: https://github.com/discourse/discourse/actions/runs/7690278010/job/20953851469
This patch allows running system specs on an aarch64 Linux system
(typically our `discourse_dev` docker image).
As Chrome isn’t available for the aarch64 architecture (yet), we have to
rely on Firefox instead. This has some drawbacks like not being able to
access the browser logs like we do with the Chrome webdriver.
Why this change?
We have been seeing specs time out on GitHub CI but the problem is that
we are unable to debug those timeouts due to a lack of information. This
change seeks to print out the backtraces of all threads right before a
spec times out on CI.
What does this change do?
1. Starts a thread on CI which will wait for a spec to start running.
1. Once a spec starts running, the thread will sleep for
`PER_SPEC_TIMEOUT_SECONDS -1` seconds.
1. After sleeping, the thread checks if the spec is still running and
prints out the backtraces of all threads if it is. Otherwise, the
thread does nothing and runs the next loop.
1. At the end of each spec run, we ensure that the thread is in a
waiting state for the next spec run to start.
Note that there is no need for us to teardown or cleanup the thread
since the process terminates after running all the tests.
Why this change?
We have been looking into a flaky system tests in one of our plugins
where the DB transaction flow can be messed up from time to time. Our
debugging effort is complicated by that fact that `test-prof` starts a
DB transaction in a `before(:all)` block which makes it hard to properly
log information. By allowing test-prof to be disabled completely, we
believe it will make it easier for us to isolate the problem we are
investigating.
What does this change do?
1. Avoid loading test-prof files if `PREFABRICATION` env has been set to
`0`.
2. Set `PREFABRICATION=0` for plugin system tests in Github actions
What we have now:
```
~~~~~~~ SERVER EXCEPTIONS ~~~~~~~Error encountered while proccessing /tag/tag24/l/latest ArgumentError: wrong number of arguments (given 1, expected 0) /__w/discourse/discourse/lib/site_setting_extension.rb:521:in `block in setup_methods'
```
What we actually want:
```
~~~~~~~ SERVER EXCEPTIONS ~~~~~~~
Error encountered while proccessing /tag/tag24/l/latest ArgumentError: wrong number of arguments (given 1, expected 0) /__w/discourse/discourse/lib/site_setting_extension.rb:521:in `block in setup_methods'
```
Why this change?
We have been running into flaky tests which seems to be related to
AR transaction problems. However, we are not able to reproduce this
locally and do not have sufficient information on our builds now to
debug the problem.
What does this change do?
Noe the following changes only applies when `ENV["GITHUB_ACTIONS"]` is
present.
This change introduces an RSpec around hook when `capture_log: true` has
been set for a test. The responsibility of the hook is to capture the
ActiveRecord debug logs and print them out.
Why this change?
When running system tests on our CI, we have been occasionally seeing
server errors like:
```
Error encountered while proccessing /stylesheets/desktop_e58cf7f686aab173f9b778797f241913c2833c39.css
NoMethodError: undefined method `+' for nil:NilClass
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/path/pattern.rb:139:in `[]'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:127:in `block (2 levels) in find_routes'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:126:in `each'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:126:in `each_with_index'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:126:in `block in find_routes'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:123:in `map!'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:123:in `find_routes'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/journey/router.rb:32:in `serve'
/__w/discourse/discourse/vendor/bundle/ruby/3.2.0/gems/actionpack-7.0.7/lib/action_dispatch/routing/route_set.rb:852:in `call'
```
While looking through various Rails issues related to the error above, I
came across https://github.com/rails/rails/pull/27647 which is a fix to
fully initialize routes before the first request is handled. However,
the routes are only fully initialize only if `config.eager_load` is set
to `true`. There is no reason why `config.eager_load` shouldn't be `true` in the
CI environment and this is what a new Rails 7.1 app is generated with.
What does this change do?
Enable `config.eager_load` when `env["CI"]` is present