Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IODeferred's CallbackStack becomes unboundedly large #3935

Closed
armanbilge opened this issue Jan 8, 2024 · 11 comments · Fixed by #3936
Closed

IODeferred's CallbackStack becomes unboundedly large #3935

armanbilge opened this issue Jan 8, 2024 · 11 comments · Fixed by #3936
Labels

Comments

@armanbilge
Copy link
Member

armanbilge commented Jan 8, 2024

h/t @mernst-github in #3359 (comment).

Minimizer:

//> using dep org.typelevel::cats-effect::3.5.2

import cats.effect._

object Leak extends IOApp.Simple {

  def run =
    IO.deferred[Unit].flatMap { d =>
      val getRace = d.get.race(IO.unit)
      getRace.replicateA_(100)
    }

}

Then I replaced the terminating () here with println(clearCounter.get()):

def clear(): Unit = {
stack.clearCurrent(handle)
val clearCount = clearCounter.incrementAndGet()
if ((clearCount & (clearCount - 1)) == 0) // power of 2
clearCounter.addAndGet(-callbacks.pack(clearCount))
()
}

Here is the output:

1
2
3
3
4
5
6
7
5
6
7
6
7
7
8
9
10
11
12
13
14
15
9
10
11
12
13
14
15
10
11
12
13
14
15
11
12
13
14
15
12
13
14
15
13
14
15
14
15
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
18
19
20
21
22
23
24
25
26
27
28
29
30
31
19
20
21
22
23
@djspiewak
Copy link
Member

So are we sure this is growing unboundedly? The printlns don't really show that. If you squint, it does kinda look like half-ish is being cleared each time.

@armanbilge
Copy link
Member Author

Yes, half-ish of the total is being cleared each time, but each time the total has doubled in size. You can run my reproducer with a larger replicateA_ value and see.

@djspiewak
Copy link
Member

Not each time though. We repeat clearance points several times.

  • 1
  • 2
  • 3
  • 3
  • 4
  • 5
  • 6
  • 7
  • 5
  • 6
  • 7
  • 6
  • 7
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 11
  • 12
  • 13
  • 14
  • 15
  • 12
  • 13
  • 14
  • 15
  • 13
  • 14
  • 15
  • 14
  • 15
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 19
  • 20
  • 21
  • 22
  • 23

It certainly feels like something here is wrong, but I'm not sure if it's a conventional example of unbounded growth.

@armanbilge
Copy link
Member Author

You're right, it's not each time, sorry. Every time we pack() we "lose" 1. So to move up to the next power of 2, first we have to "lose" that many.

@djspiewak
Copy link
Member

Ah I see what you're pointing out. Okay gotcha. I'm surprised the off by one fix doesn't correct it then…

@mernst-github
Copy link
Contributor

mernst-github commented Jan 8, 2024

We should expect require(clearCounter.addAndGet(-callbacks.pack(clearCount)) >= 0), right? That bombs for me (with and without this PR).

@armanbilge
Copy link
Member Author

@mernst-github would you be able to post a self-contained reproducer somewhere? Thanks!

@mernst-github
Copy link
Contributor

I can see the printed values gradually plunge into negative territory.
JDK 17.0.9 and 21.0.1, macOS / AArch64 and Linux / x64.

@armanbilge
Copy link
Member Author

armanbilge commented Jan 8, 2024

@mernst-github thanks, I tried that before, but I was unable to replicate your result, I probably did something wrong/different. A self-contained reproducer would help a lot :) e.g. a branch of CE with your reproducer in the example project, with the println added.

@mernst-github
Copy link
Contributor

If I'm not mistaken the 'pack' logic can overcount when called concurrently, since it may clear the same cell twice, for example like so:

 CBS: root -> c1(SET) -> c2(CLEARED) -> null
 T1:  c1.packInternal(bound, 0, root)
 T1:  c1 not cleared, call c2.packInternal(bound-1, 0, c1)
    ~~~context switch~~~~~
 <now c1 gets cleared>
 T2:  c1.packInternal(bound, 0, root)
 T2:  c1 cleared => root.CAS(c1, c2), call c2.packInternal(bound, 1, root)
 CBS:       root     ---->      c2(CLEARED) -> null
                 c1  (CLEARED) /
    ~~~context switch~~~~~
 T1 resumes @ c2.packInternal(bound-1, 0, c1)
 T1:  c2 cleared => c1.CAS(c2, null)
 CBS:       root  -> c2(CLEARED) -> null
                 c1 (CLEARED) -> null
 T1:  return 1
    ~~~context switch~~~~~
 T2 resumes @ c2.packInternal(bound, 1, root)
 T2:  c2 cleared => root.CAS(c2, null)
 CBS:       root  -> null
 T2:  return 2

=> 2 cells cleared, but c2 got double-counted => clearCounter decremented by 3

@armanbilge
Copy link
Member Author

Thanks, you're right! I was able to capture it in a test in 6e2e87d, tracking in #3940.

abestel pushed a commit to abestel/fs2-kafka that referenced this issue Apr 12, 2024
Some context to start: Cats Effects has been having memory leaks in CallbackStack since version 3.4.3.
See for example: typelevel/cats-effect#3935

I've been facing this memory leak in an application using fs2-kafka, and found that I'm not the only one (typelevel/cats-effect#3973).

Using a simple consumer > produce stream application, I monitored the size of the CallbackStack using the following command:

```
while sleep 1; do jcmd <pid> GC.class_histogram | grep 'cats.effect.CallbackStack$Node' ; done
```

I found that swapping the `F.race(shutdown, fetch)` for `fetch` stops the memory leak. This should not be an issue because the Stream is anyway interrupted on `.interruptWhen(F.race(shutdown, stopReqs.get).void.attempt)`, but I'm not 100% convinced of this.
aartigao pushed a commit to fd4s/fs2-kafka that referenced this issue Apr 17, 2024
Some context to start: Cats Effects has been having memory leaks in CallbackStack since version 3.4.3.
See for example: typelevel/cats-effect#3935

I've been facing this memory leak in an application using fs2-kafka, and found that I'm not the only one (typelevel/cats-effect#3973).

Using a simple consumer > produce stream application, I monitored the size of the CallbackStack using the following command:

```
while sleep 1; do jcmd <pid> GC.class_histogram | grep 'cats.effect.CallbackStack$Node' ; done
```

I found that swapping the `F.race(shutdown, fetch)` for `fetch` stops the memory leak. This should not be an issue because the Stream is anyway interrupted on `.interruptWhen(F.race(shutdown, stopReqs.get).void.attempt)`, but I'm not 100% convinced of this.

Co-authored-by: Adrien Bestel <adrien.bestel@daimlertruck.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants