Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed Supervisor strangeness #3394

Closed
durban opened this issue Feb 2, 2023 · 7 comments · Fixed by #3972
Closed

Closed Supervisor strangeness #3394

durban opened this issue Feb 2, 2023 · 7 comments · Fixed by #3972

Comments

@durban
Copy link
Contributor

durban commented Feb 2, 2023

It seems, that a closed (finalized) Supervisor happily starts fibers, and never cancels/joins them

import scala.concurrent.duration._

import cats.effect.{ IO, IOApp }
import cats.effect.std.Supervisor

object Sup extends IOApp.Simple {

  def run: IO[Unit] = {
    Supervisor[IO](await = false).use { sup =>
      sup.supervise(IO.println("started 1") *> IO.never.onCancel(IO.println("cancelled 1"))).as(sup)
    }.flatMap { sup =>
      sup.supervise(IO.println("started 2") *> IO.never.onCancel(IO.println("cancelled 2"))) *> (
        IO.sleep(1.second) *> IO.println("done")
      )
    }
  }
}

This prints:

started 1
cancelled 1
started 2
done

Oviously we shouldn't leak the Supervisor object, but still, this feels wrong...

@alexandrustana
Copy link
Contributor

I would like to give it a stab 🥊

But if it's not too much to ask, I'd like to clarify a few aspects.

  1. In order to check if the Supervisor instance has been closed, can I assume that
    doneR <- Resource.eval(F.ref(false))
    acts as a flag showing if the Supervisor was finalized or not?
  2. What is the expected behaviour if one uses an already closed(finalized) Supervisor? Should it raiseError? Should the caller be suspended indefinitely?

@armanbilge
Copy link
Member

@alexandrustana awesome, thanks!


In order to check if the Supervisor instance has been closed

Hmm, that doneR stuff looks suspect to me. The problem is there's a race condition:

  1. Thread A: check doneR, get false. Proceed to submit task ...
  2. Thread B: close supervisor, doneR.set(true), join/cancel all fibers.
  3. Thread A: ... submit task to closed supervisor.

We had and have various bugs in Dispatcher due to these kinds of race conditions.

We currently have two implementations of Supervisor.

  1. For the implementation based on Concurrent/Ref, we can incorporate its open/closed status directly into its state. Then we can do atomic updates on the Ref and only start the task if we were successfully able to register it on a non-closed Supervisor. Then there is no race condition, and we can make doneR go away.

  2. For the implementation based on Async/ConcurrentHashMap there might be no way we can handle the open/closed status atomically. So then we will need to keep doneR and do a "double-check" i.e. after registering the task, we double check the value of doneR and if it's true we assume the Supervisor was already close and we self-cleanup before raising an error.


What is the expected behaviour

We use an IllegalStateException in Dispatcher, we can do a similar thing here.

throw new IllegalStateException("dispatcher already shutdown")


Let me know if you have questions about anything!

@alexandrustana
Copy link
Contributor

Hmm, that doneR stuff looks suspect to me. The problem is there's a race condition:

But since doneR is wrapped in a Ref shouldn't it be safe to use even in a race condition?

@armanbilge
Copy link
Member

But since doneR is wrapped in a Ref shouldn't it be safe to use even in a race condition?

Not really. Imagine the following scenario where doneR is a Ref:

  1. Thread A: check doneR, get false. Proceed to submit task ...
  2. Thread B: close supervisor, doneR.set(true), join/cancel all fibers.
  3. Thread A: ... submit task to closed supervisor.

@alexandrustana
Copy link
Contributor

Took a while, but I finally understood the race condition 😅 sorry for that.

In my mind, there were some delays between the threads.

@alexandrustana
Copy link
Contributor

Only now I saw on Discord that @mn98 already wanted to fix this issue.

Didn't intend to steal the ticket 😞 sorry.

@mn98
Copy link
Contributor

mn98 commented Oct 5, 2023

Didn't intend to steal the ticket 😞 sorry.

That is absolutely fine, I am very happy to follow along here and learn🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment