Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures that happen in after do not trigger a retry #110

Closed
schneems opened this issue Aug 10, 2020 · 1 comment
Closed

Failures that happen in after do not trigger a retry #110

schneems opened this issue Aug 10, 2020 · 1 comment

Comments

@schneems
Copy link
Contributor

From: https://app.circleci.com/pipelines/github/heroku/hatchet/160/workflows/37a65f89-4da7-4c54-bdb0-09eac28cc942/jobs/474

An error occurred in an `after(:context)` hook.
Failure/Error: @app.run_multi("ls") { |out| expect(out).to include("Gemfile") }
  expected "" to include "Gemfile"
# ./spec/hatchet/app_spec.rb:217:in `block (4 levels) in <top (required)>'
# ./lib/hatchet/app.rb:240:in `block in run_multi'

RSpec::Retry: 2nd try ./spec/hatchet/app_spec.rb:174
.

Finished in 22.84 seconds (files took 2.11 seconds to load)
2 examples, 0 failures, 1 error occurred outside of examples

For whatever reason sometimes heroku run <command> comes back empty. I wish I knew why and I would fix it, but until then we're managing the situation through relying on rspec-retry. The problem is that the logic in this test doesn't propagate while inside of the test context but instead in the after(:all) step:

    before(:all) do
      skip("Must set HATCHET_EXPENSIVE_MODE") unless ENV["HATCHET_EXPENSIVE_MODE"]

      @app = Hatchet::GitApp.new("default_ruby", run_multi: true)
      @app.deploy
    end

    after(:all) do
      @app.teardown! if @app # <====== ERROR fires here because it's where we're waiting on threads to join
    end

    it "test one" do
      @app.run_multi("ls") { |out| expect(out).to include("Gemfile") }  # <== This is where the thread is spawned that the error comes from
      expect(@app.platform_api.formation.list(@app.name).detect {|ps| ps["type"] == "web"}["size"].downcase).to_not eq("free")
    end
@schneems
Copy link
Contributor Author

Here are some options:

  1. Have retry_multi commands auto-retry whatever Hatchet's default retry value is when it sees an exception. The downside here is that when tests start to fail it now takes a much longer time to fail if they're "valid" failures and not due to flappy tests.
  2. Explore if the retry behavior can be applied retroactively to a test who's after block fails. This behavior would be tricky to generalize, presumably for an after(:all) you wouldn't know where the behavior came from, and would have to re-run the entire group. This would be very invasive and difficult.
  3. Add or document some kind of a manually join option to ensure the run passed or failed in the test itself (where it can safely be re-run). Downside is that it introduces a manually complex step that requires the developer to remember.

is probably the easiest to implement as well as buys us the most transparency in terms of what's happening. It could look like this if we returned the thread instead of true:

    it "test one" do
      background_tasks = []
      background_tasks << @app.run_multi("ls") { |out| expect(out).to include("Gemfile") }
      expect(@app.platform_api.formation.list(@app.name).detect {|ps| ps["type"] == "web"}["size"].downcase).to_not eq("free")

      background_tasks.map(&:join)
    end

But that's pretty ugly.

  1. Introduce a retry: flag to the run_multi to allow the developer to do it manually. Downside is that it still requires manual intervention.

  2. Document the behavior and don't encourage it, if someone wants to write a test this way they can use the non-concurrent run in multiple processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant