DEV: Introduce automatic reruns to RSpec tests on Github actions (#24811)

What motivated this change?

Our builds on Github actions have been extremely flaky mostly due to system tests. This has led to a drop in confidence
in our test suite where our developers tend to assume that a failed job is due to a flaky system test. As a result, we
have had occurrences where changes that resulted in legitimate test failures are merged into the `main` branch because developers
assumed it was a flaky test.

What does this change do?

This change seeks to reduce the flakiness of our builds on Github Actions by automatically re-running RSpec tests once when
they fail. If a failed test passes subsequently in the re-run, we mark the test as flaky by logging it into a file on disk
which is then uploaded as an artifact of the Github workflow run. We understand that automatically re-runs will lead to 
lower accuracy of our tests but we accept this as an acceptable trade-off since a fragile build has a much greater impact
on our developers' time. Internally, the Discourse development team will be running a service to fetch the flaky tests 
which have been logged for internal monitoring.

How is the change implemented?

1. A `--retry-and-log-flaky-tests` CLI flag is added to the `bin/turbo_rspec` CLI which will then initialize `TurboTests::Runner` 
with the `retry_and_log_flaky_tests` kwarg set to `true`. 

2. When the `retry_and_log_flaky_tests` kwarg is set to `true` for `TurboTests::Runner`, we will register an additional 
formatter `Flaky::FailuresLoggerFormatter` to the `TurboTests::Reporter` in the `TurboTests::Runner#run` method. 
The `Flaky::FailuresLoggerFormatter` has a simple job of logging all failed examples to a file on disk when running all the 
tests. The details of the failed example which are logged can be found in `TurboTests::Flaky::FailedExample.to_h`.

3. Once all the tests have been run once, we check the result for any failed examples and if there are, we read the file on
disk to fetch the `location_rerun_location` of the failed examples which is then used to run the tests in a new RSpec process.
In the rerun, we configure a `TurboTests::Flaky::FlakyDetectorFormatter` with RSpec which removes all failed examples from the log file on disk since those examples are not flaky tests. Note that if there are too many failed examples on the first run, we will deem the failures to likely not be due to flaky tests and not re-run the test failures. As of writing, the threshold of failed examples is set to 10. If there are more than 10 failed examples, we will not re-run the failures.
This commit is contained in:
Alan Guo Xiang Tan 2023-12-13 07:18:27 +08:00 committed by GitHub
parent 5055e431a8
commit 39da9106ba
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 436 additions and 61 deletions

View File

@ -233,11 +233,11 @@ jobs:
- name: Core RSpec
if: matrix.build_type == 'backend' && matrix.target == 'core'
run: bin/turbo_rspec --use-runtime-info --verbose --format documentation
run: bin/turbo_rspec --use-runtime-info --verbose --format documentation --retry-and-log-flaky-tests
- name: Plugin RSpec
if: matrix.build_type == 'backend' && matrix.target == 'plugins'
run: bin/rake plugin:turbo_spec['*','--verbose --format documentation --use-runtime-info']
run: bin/rake plugin:turbo_spec['*','--verbose --format documentation --use-runtime-info --retry-and-log-flaky-tests']
- name: Plugin QUnit
if: matrix.build_type == 'frontend' && matrix.target == 'plugins'
@ -265,35 +265,68 @@ jobs:
- name: Core System Tests
if: matrix.build_type == 'system' && matrix.target == 'core'
run: RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation spec/system
run: RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation --retry-and-log-flaky-tests spec/system
- name: Plugin System Tests
if: matrix.build_type == 'system' && matrix.target == 'plugins'
run: |
GLOBIGNORE="plugins/chat/*";
LOAD_PLUGINS=1 RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation plugins/*/spec/system
LOAD_PLUGINS=1 RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation --retry-and-log-flaky-tests plugins/*/spec/system
shell: bash
timeout-minutes: 30
- name: Chat System Tests
if: matrix.build_type == 'system' && matrix.target == 'chat'
run: LOAD_PLUGINS=1 RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation plugins/chat/spec/system
run: LOAD_PLUGINS=1 RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --use-runtime-info --profile=50 --verbose --format documentation --retry-and-log-flaky-tests plugins/chat/spec/system
timeout-minutes: 30
- name: Theme System Tests
if: matrix.build_type == 'system' && matrix.target == 'themes'
run: |
RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --profile=50 --verbose --format documentation tmp/themes/*/spec/system
RAILS_ENABLE_TEST_LOG=1 RAILS_TEST_LOG_LEVEL=error PARALLEL_TEST_PROCESSORS=4 bin/turbo_rspec --profile=50 --verbose --format documentation --retry-and-log-flaky-tests tmp/themes/*/spec/system
shell: bash
timeout-minutes: 30
- name: Check for failed system test screenshots
id: check-failed-system-test-screenshots
if: always() && matrix.build_type == 'system'
run: |
if [ -d tmp/capybara ] && [ "$(ls -A tmp/capybara/)" ]; then
echo "exists=true" >> $GITHUB_OUTPUT
else
echo "exists=false" >> $GITHUB_OUTPUT
fi
shell: bash
- name: Upload failed system test screenshots
uses: actions/upload-artifact@v3
if: matrix.build_type == 'system' && failure()
if: always() && steps.check-failed-system-test-screenshots.outputs.exists == 'true'
with:
name: failed-system-test-screenshots
path: tmp/capybara/*.png
- name: Check for flaky tests report
id: check-flaky-spec-report
if: always()
run: |
if [ -f tmp/turbo_rspec_flaky_tests.json ]; then
echo "exists=true" >> $GITHUB_OUTPUT
else
echo "exists=false" >> $GITHUB_OUTPUT
fi
shell: bash
- name: Create flaky tests report artifact
if: always() && steps.check-flaky-spec-report.outputs.exists == 'true'
run: cp tmp/turbo_rspec_flaky_tests.json tmp/turbo_rspec_flaky_tests-${{ matrix.build_type }}-${{ matrix.target }}.json
- name: Upload flaky tests report
uses: actions/upload-artifact@v3
if: always() && steps.check-flaky-spec-report.outputs.exists == 'true'
with:
name: flaky-test-reports
path: tmp/turbo_rspec_flaky_tests-${{ matrix.build_type }}-${{ matrix.target }}.json
- name: Check Annotations
if: matrix.build_type == 'annotations'
run: |

View File

@ -16,6 +16,7 @@ profile_print_slowest_examples_count = 10
use_runtime_info = nil
enable_system_tests = nil
disable_system_tests = nil
retry_and_log_flaky_tests = false
OptionParser
.new do |opts|
@ -51,6 +52,11 @@ OptionParser
fail_fast = (n.nil? || n < 1) ? 1 : n
end
opts.on(
"--retry-and-log-flaky-tests",
"Retry failed tests and log if test is deemed to be flaky",
) { retry_and_log_flaky_tests = true }
opts.on("--seed SEED", "The seed for the random order") { |s| seed = s.to_i }
opts.on("--use-runtime-info", "Use runtime info for tests group splitting") do
@ -67,7 +73,6 @@ OptionParser
end
.parse!(ARGV)
if enable_system_tests && disable_system_tests
STDERR.puts "Error: I'm not sure how to enable and disable system tests simultaneously"
exit 1
@ -118,6 +123,7 @@ success =
seed: seed.to_s,
profile:,
profile_print_slowest_examples_count:,
retry_and_log_flaky_tests:,
)
if success

View File

@ -63,10 +63,11 @@ module TurboTests
:metadata,
:location_rerun_argument,
:process_id,
:command_string,
)
class FakeExample
def self.from_obj(obj, process_id)
def self.from_obj(obj, process_id:, command_string:)
obj = obj.symbolize_keys
metadata = obj[:metadata].symbolize_keys
@ -86,6 +87,7 @@ module TurboTests
metadata,
obj[:location_rerun_argument],
process_id,
command_string,
)
end

View File

@ -0,0 +1,67 @@
# frozen_string_literal: true
module TurboTests
module Flaky
class FailedExample
# @param [TurboTests::FakeExample] failed_example
def initialize(failed_example)
@failed_example = failed_example
@failed_example_notification = failed_example.notification
end
# See https://www.rubydoc.info/gems/rspec-core/RSpec%2FCore%2FNotifications%2FFailedExampleNotification:message_lines
def message_lines
lines = @failed_example_notification.message_lines.join("\n")
# Strip ANSI color codes from the message lines as we are likely running in a terminal where `RSpec.color` is enabled
lines.gsub!(/\e\[[0-9;]*m/, "")
lines.strip!
lines
end
# See https://www.rubydoc.info/gems/rspec-core/RSpec%2FCore%2FNotifications%2FFailedExampleNotification:description
def description
@failed_example_notification.description
end
# See https://www.rubydoc.info/gems/rspec-core/RSpec%2FCore%2FNotifications%2FFailedExampleNotification:formatted_backtrace
def backtrace
@failed_example_notification.formatted_backtrace
end
def location_rerun_argument
@failed_example.location_rerun_argument
end
SCREENSHOT_PREFIX = "[Screenshot Image]: "
# Unfortunately this has to be parsed from the output because `ActionDispatch` is just printing the path instead of
# properly adding the screenshot to the test metadata.
def failure_screenshot_path
@failed_example_notification.message_lines.each do |message_line|
if message_line.strip.start_with?(SCREENSHOT_PREFIX)
return message_line.split(SCREENSHOT_PREFIX).last.chomp
end
end
nil
end
def rerun_command
@failed_example.command_string
end
def to_h
{
message_lines:,
description:,
backtrace:,
failure_screenshot_path:,
location_rerun_argument:,
rerun_command:,
}
end
end
end
end

View File

@ -0,0 +1,13 @@
# frozen_string_literal: true
module TurboTests
module Flaky
# This class is responsible for logging the failed examples in JSON format with the necessary debugging information
# to debug the test failures. See `TurboTests::Flaky::FailedExample#to_h` for the debugging information that we log.
class FailuresLoggerFormatter
def dump_summary(notification, _timings)
Manager.log_potential_flaky_tests(notification.failed_examples)
end
end
end
end

View File

@ -0,0 +1,13 @@
# frozen_string_literal: true
module TurboTests
module Flaky
class FlakyDetectorFormatter < RSpec::Core::Formatters::BaseFormatter
RSpec::Core::Formatters.register self, :dump_failures
def dump_failures(notification)
Manager.remove_example(notification.failed_examples)
end
end
end
end

View File

@ -0,0 +1,50 @@
# frozen_string_literal: true
module TurboTests
module Flaky
class Manager
PATH = Rails.root.join("tmp/turbo_rspec_flaky_tests.json")
def self.potential_flaky_tests
JSON
.parse(File.read(PATH))
.map { |failed_example| failed_example["location_rerun_argument"] }
end
# This method should only be called by a formatter registered with `TurboTests::Runner` and logs the failed examples
# to `PATH`. See `FailedExample#to_h` for the details of each example that is logged.
#
# @param [Array<TurboTests::FakeExample>] failed_examples
def self.log_potential_flaky_tests(failed_examples)
return if failed_examples.empty?
File.open(PATH, "w") do |file|
file.puts(
failed_examples.map { |failed_example| FailedExample.new(failed_example).to_h }.to_json,
)
end
end
# This method should only be called by a formatter registered with `RSpec::Core::Formatters.register` and removes
# the given examples from the log file at `PATH` by matching the `location_rerun_argument` of each example.
#
# @param [Array<RSpec::Core::Example>] failed_examples
def self.remove_example(failed_examples)
flaky_tests =
JSON
.parse(File.read(PATH))
.reject do |failed_example|
failed_examples.any? do |example|
failed_example["location_rerun_argument"] == example.location_rerun_argument
end
end
if flaky_tests.present?
File.write(PATH, flaky_tests.to_json)
else
File.delete(PATH)
end
end
end
end
end

View File

@ -0,0 +1,55 @@
# frozen_string_literal: true
module TurboTests
class JsonExample
def initialize(rspec_example)
@rspec_example = rspec_example
end
def to_json
{
execution_result: execution_result_to_json(@rspec_example.execution_result),
location: @rspec_example.location,
full_description: @rspec_example.full_description,
metadata: {
shared_group_inclusion_backtrace:
@rspec_example.metadata[:shared_group_inclusion_backtrace].map(
&method(:stack_frame_to_json)
),
extra_failure_lines: @rspec_example.metadata[:extra_failure_lines],
run_duration_ms: @rspec_example.metadata[:run_duration_ms],
process_pid: Process.pid,
},
location_rerun_argument: @rspec_example.location_rerun_argument,
}
end
private
def stack_frame_to_json(frame)
{ shared_group_name: frame.shared_group_name, inclusion_location: frame.inclusion_location }
end
def exception_to_json(exception)
if exception
{
class_name: exception.class.name.to_s,
backtrace: exception.backtrace,
message: exception.message,
cause: exception_to_json(exception.cause),
}
end
end
def execution_result_to_json(result)
{
example_skipped?: result.example_skipped?,
pending_message: result.pending_message,
status: result.status,
pending_fixed?: result.pending_fixed?,
exception: exception_to_json(result.exception),
pending_exception: exception_to_json(result.pending_exception),
}
end
end
end

View File

@ -19,58 +19,16 @@ module TurboTests
@output = output
end
def exception_to_json(exception)
if exception
{
class_name: exception.class.name.to_s,
backtrace: exception.backtrace,
message: exception.message,
cause: exception_to_json(exception.cause),
}
end
end
def execution_result_to_json(result)
{
example_skipped?: result.example_skipped?,
pending_message: result.pending_message,
status: result.status,
pending_fixed?: result.pending_fixed?,
exception: exception_to_json(result.exception),
pending_exception: exception_to_json(result.pending_exception),
}
end
def stack_frame_to_json(frame)
{ shared_group_name: frame.shared_group_name, inclusion_location: frame.inclusion_location }
end
def example_to_json(example)
{
execution_result: execution_result_to_json(example.execution_result),
location: example.location,
full_description: example.full_description,
metadata: {
shared_group_inclusion_backtrace:
example.metadata[:shared_group_inclusion_backtrace].map(&method(:stack_frame_to_json)),
extra_failure_lines: example.metadata[:extra_failure_lines],
run_duration_ms: example.metadata[:run_duration_ms],
process_pid: Process.pid,
},
location_rerun_argument: example.location_rerun_argument,
}
end
def example_passed(notification)
output_row(type: :example_passed, example: example_to_json(notification.example))
output_row(type: :example_passed, example: JsonExample.new(notification.example).to_json)
end
def example_pending(notification)
output_row(type: :example_pending, example: example_to_json(notification.example))
output_row(type: :example_pending, example: JsonExample.new(notification.example).to_json)
end
def example_failed(notification)
output_row(type: :example_failed, example: example_to_json(notification.example))
output_row(type: :example_failed, example: JsonExample.new(notification.example).to_json)
end
def seed(notification)

View File

@ -44,7 +44,7 @@ module TurboTests
Kernel.const_get(name)
end
@formatters << formatter_class.new(output)
add_formatter(formatter_class.new(output))
end
end
@ -111,6 +111,10 @@ module TurboTests
delegate_to_formatters(:close, RSpec::Core::Notifications::NullNotification)
end
def add_formatter(formatter)
@formatters << formatter
end
protected
def delegate_to_formatters(method, *args)

View File

@ -10,6 +10,7 @@ module TurboTests
verbose = opts.fetch(:verbose, false)
fail_fast = opts.fetch(:fail_fast, nil)
use_runtime_info = opts.fetch(:use_runtime_info, false)
retry_and_log_flaky_tests = opts.fetch(:retry_and_log_flaky_tests, false)
STDOUT.puts "VERBOSE" if verbose
@ -37,6 +38,7 @@ module TurboTests
use_runtime_info: use_runtime_info,
seed: seed,
profile: opts[:profile],
retry_and_log_flaky_tests: retry_and_log_flaky_tests,
).run
end
@ -56,6 +58,7 @@ module TurboTests
@use_runtime_info = opts[:use_runtime_info]
@seed = opts[:seed]
@profile = opts[:profile]
@retry_and_log_flaky_tests = opts[:retry_and_log_flaky_tests]
@failure_count = 0
@messages = Queue.new
@ -76,6 +79,8 @@ module TurboTests
setup_tmp_dir
@reporter.add_formatter(Flaky::FailuresLoggerFormatter.new) if @retry_and_log_flaky_tests
subprocess_opts = { record_runtime: @use_runtime_info }
start_multisite_subprocess(@files, **subprocess_opts)
@ -90,6 +95,17 @@ module TurboTests
@threads.each(&:join)
if @retry_and_log_flaky_tests && @reporter.failed_examples.present?
retry_failed_examples_threshold = 10
if @reporter.failed_examples.length <= retry_failed_examples_threshold
STDOUT.puts "Retrying failed examples and logging flaky tests..."
return rerun_failed_examples(@reporter.failed_examples)
else
STDOUT.puts "Retry and log flaky tests was enabled but ignored because there are more than #{retry_failed_examples_threshold} failures."
end
end
@reporter.failed_examples.empty? && !@error
end
@ -126,6 +142,21 @@ module TurboTests
FileUtils.mkdir_p("tmp/test-pipes/")
end
def rerun_failed_examples(failed_examples)
command = [
"bundle",
"exec",
"rspec",
"--format",
"documentation",
"--format",
"TurboTests::Flaky::FlakyDetectorFormatter",
*Flaky::Manager.potential_flaky_tests,
]
system(*command)
end
def start_multisite_subprocess(tests, **opts)
start_subprocess({}, %w[--tag type:multisite], tests, "multisite", **opts)
end
@ -177,11 +208,11 @@ module TurboTests
env["DISCOURSE_RSPEC_PROFILE_EACH_EXAMPLE"] = "1" if @profile
if @verbose
command_str = [env.map { |k, v| "#{k}=#{v}" }.join(" "), command.join(" ")].join(" ")
command_string = [env.map { |k, v| "#{k}=#{v}" }.join(" "), command.join(" ")].join(" ")
if @verbose
STDOUT.puts "::group::[#{process_id}] Run RSpec" if ENV["GITHUB_ACTIONS"]
STDOUT.puts "Process #{process_id}: #{command_str}"
STDOUT.puts "Process #{process_id}: #{command_string}"
STDOUT.puts "::endgroup::" if ENV["GITHUB_ACTIONS"]
end
@ -194,6 +225,7 @@ module TurboTests
message = JSON.parse(line)
message = message.symbolize_keys
message[:process_id] = process_id
message[:command_string] = command_string
@messages << message
end
end
@ -231,13 +263,31 @@ module TurboTests
message = @messages.pop
case message[:type]
when "example_passed"
example = FakeExample.from_obj(message[:example], message[:process_id])
example =
FakeExample.from_obj(
message[:example],
process_id: message[:process_id],
command_string: message[:command_string],
)
@reporter.example_passed(example)
when "example_pending"
example = FakeExample.from_obj(message[:example], message[:process_id])
example =
FakeExample.from_obj(
message[:example],
process_id: message[:process_id],
command_string: message[:command_string],
)
@reporter.example_pending(example)
when "example_failed"
example = FakeExample.from_obj(message[:example], message[:process_id])
example =
FakeExample.from_obj(
message[:example],
process_id: message[:process_id],
command_string: message[:command_string],
)
@reporter.example_failed(example)
@failure_count += 1
if fail_fast_met

View File

@ -0,0 +1,124 @@
# frozen_string_literal: true
RSpec.describe TurboTests::Flaky::Manager do
fab!(:rspec_example_1) do
RSpec::Core::Example
.describe
.example("rspec example 1")
.tap do |example|
example.execution_result.status = :failed
example.execution_result.exception =
StandardError
.new(
"some error\n\n#{TurboTests::Flaky::FailedExample::SCREENSHOT_PREFIX}/some/path/to/screenshot.png",
)
.tap { |exception| exception.set_backtrace(["some backtrace"]) }
end
end
fab!(:rspec_example_2) do
RSpec::Core::Example
.describe
.example("rspec example 2")
.tap do |example|
example.execution_result.status = :failed
example.execution_result.exception =
StandardError
.new("some error")
.tap { |exception| exception.set_backtrace(["some backtrace"]) }
end
end
fab!(:fake_example_1) do
TurboTests::FakeExample.from_obj(
TurboTests::JsonExample.new(rspec_example_1).to_json,
process_id: 1,
command_string: "some command string",
)
end
fab!(:fake_example_2) do
TurboTests::FakeExample.from_obj(
TurboTests::JsonExample.new(rspec_example_2).to_json,
process_id: 2,
command_string: "some other command string",
)
end
def with_fake_path
tmp_file = Tempfile.new
stub_const(TurboTests::Flaky::Manager, "PATH", tmp_file.path) { yield }
ensure
tmp_file.delete
end
describe ".potential_flaky_tests" do
it "should return the failed examples' `location_rerun_argument`" do
with_fake_path do
TurboTests::Flaky::Manager.log_potential_flaky_tests([fake_example_1, fake_example_2])
expect(TurboTests::Flaky::Manager.potential_flaky_tests).to eq(
%w[
./spec/lib/turbo_tests/flaky/manager_spec.rb:7
./spec/lib/turbo_tests/flaky/manager_spec.rb:22
],
)
end
end
end
describe ".log_potential_flaky_tests" do
it "should log the failed examples correctly" do
with_fake_path do
TurboTests::Flaky::Manager.log_potential_flaky_tests([fake_example_1, fake_example_2])
expect(JSON.parse(File.read(TurboTests::Flaky::Manager::PATH))).to eq(
[
{
"message_lines" =>
"Failure/Error: Unable to infer file and line number from backtrace\n\nStandardError:\n some error\n\n [Screenshot Image]: /some/path/to/screenshot.png",
"description" => "rspec example 1",
"backtrace" => ["some backtrace"],
"failure_screenshot_path" => "/some/path/to/screenshot.png",
"location_rerun_argument" => "./spec/lib/turbo_tests/flaky/manager_spec.rb:7",
"rerun_command" => "some command string",
},
{
"message_lines" =>
"Failure/Error: Unable to infer file and line number from backtrace\n\nStandardError:\n some error",
"description" => "rspec example 2",
"backtrace" => ["some backtrace"],
"failure_screenshot_path" => nil,
"location_rerun_argument" => "./spec/lib/turbo_tests/flaky/manager_spec.rb:22",
"rerun_command" => "some other command string",
},
],
)
end
end
end
describe ".remove_example" do
it "should remove the from the log file" do
with_fake_path do
TurboTests::Flaky::Manager.log_potential_flaky_tests([fake_example_1, fake_example_2])
TurboTests::Flaky::Manager.remove_example([rspec_example_1])
parsed_json = JSON.parse(File.read(TurboTests::Flaky::Manager::PATH))
expect(parsed_json.size).to eq(1)
expect(parsed_json.first["description"]).to eq("rspec example 2")
end
end
it "should delete the log file if there are no more examples" do
with_fake_path do
TurboTests::Flaky::Manager.log_potential_flaky_tests([fake_example_1, fake_example_2])
TurboTests::Flaky::Manager.remove_example([rspec_example_1, rspec_example_2])
expect(File.exist?(TurboTests::Flaky::Manager::PATH)).to eq(false)
end
end
end
end