Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime,time: timer.Stop returns false even when no value is read from the channel #69312

Closed
sukunrt opened this issue Sep 6, 2024 · 16 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@sukunrt
Copy link

sukunrt commented Sep 6, 2024

Go version

go1.23.0 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/sukun/.cache/go-build'
GOENV='/home/sukun/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/sukun/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/sukun/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/sukun/dev/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/sukun/dev/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/sukun/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/sukun/dev/scratch/timer/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build59502758=/tmp/go-build -gno-record-gcc-switches'

What did you do?

For the new unbuffered timers introduced in go1.23, I expected that timer.Stop would return true, if no value is read from timer channel. This is not the case, sometimes timer.Stop returns false even when no value is read from the channel and the timer wasn't stopped before. https://go.dev/play/p/t_vaPhSAkmw

In production this is breaking quic-go: quic-go/quic-go#4659
There are details in that PR about how it's breaking Kubo v0.30.0-rc2(https://github.com/ipfs/kubo/tree/v0.30.0-rc2)

What did you see happen?

timer.Stop returns false even if no value was read. And there's no corresponding value in the timer channel. This breaks existing code which relies on tracking whether it read from the timer channel or not like the one in quic-go here: https://github.com/quic-go/quic-go/blob/master/internal/utils/timer.go

What did you expect to see?

I expected it to always return true if no value was read from the channel.

Looking at the changelog: https://go-review.googlesource.com/c/go/+/568341 there's an example of correct timer usage that shouldn't be broken.

func main() {
		t := time.NewTimer(2 * time.Second)
		time.Sleep(3 * time.Second)
		if !t.Reset(2*time.Second) {
			<-t.C
		}
		<-t.C
	}
	

I believe this will break(block forever) too in some circumstances. From the code we can see why this would happen. unlockAndRun updates the timer state and releases timer lock. modify reads t.when == 0 and returns false, then unlockAndRun gets the sendLock but cancels the push to the channel.
Repro here: https://go.dev/play/p/y_PZbPlwqrM

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 6, 2024
@cuonglm
Copy link
Member

cuonglm commented Sep 6, 2024

From Timer.Stop documentation:

// For a chan-based timer created with NewTimer(d), as of Go 1.23,
// any receive from t.C after Stop has returned is guaranteed to block
// rather than receive a stale time value from before the Stop;
// if the program has not received from t.C already and the timer is
// running, Stop is guaranteed to return true.

So even if you don't read from the channel, but the timer is not running -> Stop could return false.

@ianlancetaylor
Copy link
Member

The text of the issue talks about timer.Stop but the example only calls timer.Reset. Is the problem with Stop or Reset?

@sukunrt
Copy link
Author

sukunrt commented Sep 6, 2024

It's with both:
The repro for Stop: https://go.dev/play/p/t_vaPhSAkmw
The repro for Reset: https://go.dev/play/p/y_PZbPlwqrM

Both the examples Reset the timer. Then they call Stop or Reset without reading from the channel. The calls should return true, but some times they return false.

@sukunrt
Copy link
Author

sukunrt commented Sep 6, 2024

I've simplified the examples:

Stop: https://go.dev/play/p/ofa64fUdVBT
Reset: https://go.dev/play/p/wWb_381RSqq

@ianlancetaylor
Copy link
Member

Thanks. In https://go.dev/play/p/ofa64fUdVBT there is a race between running the timer and stopping the timer. The code requires that Stop return true. But Stop is permitted to return false if the timer has already expired. The documentation for Stop says "if the program has not received from t.C already and the timer is running, Stop is guaranteed to return true." In the failure case, the timer has already expired.

If I understand you correctly, you are saying that because the timer is running, and because nothing has received from the channel, then Stop must return true. But in the failing case, the timer has expired.

The code you point to in runtime/time.go is intended to handle exactly the case: a timer that has expired racing with a call to Stop. We want to guarantee that after Stop returns we will not receive a value on the channel. So even though the timer has expired, if there is a racing call to Stop, we ensure that we don't send a value on the channel. That seems like a valid interpretation of the race.

So it's not yet obvious to me that there is a bug here. We have to handle the race somehow. We want to be sure that after Stop returns there is no value on the channel. But Stop can still return false if the timer has expired, and in that case Stop can also prevent the value from being sent on the channel.

Or so it seems to me. What am I missing? Thanks.

@sukunrt
Copy link
Author

sukunrt commented Sep 6, 2024

Thanks @ianlancetaylor, @cuonglm. Some thoughts.

  1. This breaks all existing code that relies on return value of timer.Stop or timer.Reset to drain from the channel. An example here: https://github.com/quic-go/quic-go/blob/master/internal/utils/timer.go

I believe this is a bug because in the changelist entry that introduced this change there is this example:

Note that a correct version of the program would be:

	func main() {
		t := time.NewTimer(2 * time.Second)
		time.Sleep(3 * time.Second)
		if !t.Reset(2*time.Second) {
			<-t.C
		}
		<-t.C
	}

This works with either semantics, by heeding t.Reset's result.
The change should not affect correct programs.

This example also suffers from the same race condition. The same argument for Stop applies to Reset. If for some reason the timer execution is delayed, t.Reset in the example might return false in go1.23 and there will be no value in the channel after t.Reset returns.

If I understand you correctly, you are saying that because the timer is running, and because nothing has received from the channel, then Stop must return true. But in the failing case, the timer has expired.

The timer only "expires" when the value is read from the channel, no?
Consider this example. We consider that this timer hasn't "expired" when Stop is called.

timer := time.NewTimer(1 * time.Millisecond)
time.Sleep(10 * time.Second)
fmt.Println(timer.Stop()) // This will surely return true

This applies to @cuonglm 's point too

So even if you don't read from the channel, but the timer is not running -> Stop could return false.

I'm uncomfortable with this definition of not running
There are three stages to executing the timer.

  1. We determine the timer can be executed, now > t.when and wait for t.lock
  2. Acquire t.lock and update the timer state t.when = 0, release t.lock and wait for t.stopLock
  3. Acquire t.stopLock and execute t.f

If t.Stop executes after 1 => return true
If t.Stop executes after 2 before 3 => return false
If t.Stop executes after 3 => return true

Why is the timer not running when Stop executes after 2 but is running when Stop executes after 3?

  1. If this is expected behavior, what's the way to Reset existing timers such that it works with both go1.22 and go1.23 and that there are no values from before the Reset in the timer channel.
    Is there a clean solution I am missing?

@ianlancetaylor
Copy link
Member

OK, I think I see what you are getting at. In the racing case, Stop is preventing the timer from sending a value on the channel. Therefore, the timer has been stopped, and Stop should return true.

@ianlancetaylor
Copy link
Member

@gopherbot Please open a backport to 1.23.

This bug makes it difficult or impossible to write timer code that uses Stop and Reset and works correctly for all versions of Go.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #69333 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/611496 mentions this issue: runtime: if stop/reset races with running timer, return correct result

@dmitshur dmitshur added this to the Go1.24 milestone Sep 9, 2024
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. labels Sep 9, 2024
oncilla added a commit to oncilla/scion that referenced this issue Sep 16, 2024
Until golang/go#69312 is resolved,
force the old timer behavior by specifying an older go version
in the go.mod file.
@dmitshur dmitshur moved this to In Progress in Go Compiler / Runtime Sep 17, 2024
oncilla added a commit to scionproto/scion that referenced this issue Sep 17, 2024
Until golang/go#69312 is resolved, force the
old timer behavior by specifying an older go version in the go.mod file.

Fixes #4606
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/616096 mentions this issue: [release-branch.go1.23] runtime: if stop/reset races with running timer, return correct result

gopherbot pushed a commit that referenced this issue Sep 27, 2024
…er, return correct result

The timer code is careful to ensure that if stop/reset is called
while a timer is being run, we cancel the run. However, the code
failed to ensure that in that case stop/reset returned true,
meaning that the timer had been stopped. In the racing case
stop/reset could see that t.when had been set to zero,
and return false, even though the timer had not and never would fire.

Fix this by tracking whether a timer run is in progress,
and using that to reliably detect that the run was cancelled,
meaning that stop/reset should return true.

For #69312
Fixes #69333

Change-Id: I78e870063eb96650638f12c056e32c931417c84a
Reviewed-on: https://go-review.googlesource.com/c/go/+/611496
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 2ebaff4)
Reviewed-on: https://go-review.googlesource.com/c/go/+/616096
Reviewed-by: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
@mknyszek mknyszek changed the title runtime: timer.Stop returns false even when no value is read from the channel runtime,time: timer.Stop returns false even when no value is read from the channel Oct 1, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/617497 mentions this issue: runtime: clear isSending bit earlier

gopherbot pushed a commit that referenced this issue Oct 2, 2024
I've done some more testing of the new isSending field.
I'm not able to get more than 2 bits set. That said,
with this change it's significantly less likely to have even
2 bits set. The idea here is to clear the bit before possibly
locking the channel we are sending the value on, thus avoiding
some delay and some serialization.

For #69312

Change-Id: I8b5f167f162bbcbcbf7ea47305967f349b62b0f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/617497
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/617596 mentions this issue: [release-branch.go1.23] runtime: clear isSending bit earlier

gopherbot pushed a commit that referenced this issue Oct 8, 2024
I've done some more testing of the new isSending field.
I'm not able to get more than 2 bits set. That said,
with this change it's significantly less likely to have even
2 bits set. The idea here is to clear the bit before possibly
locking the channel we are sending the value on, thus avoiding
some delay and some serialization.

For #69312
For #69333

Change-Id: I8b5f167f162bbcbcbf7ea47305967f349b62b0f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/617596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621616 mentions this issue: runtime,time: use atomic.Int32 for isSending

gopherbot pushed a commit that referenced this issue Oct 21, 2024
This change switches isSending to be an atomic.Int32 instead of an
atomic.Uint8. The Int32 version is managed as a counter, which is
something that we couldn't do with Uint8 without adding a new intrinsic
which may not be available on all architectures.

That is, instead of only being able to support 8 concurrent timer
firings on the same timer because we only have 8 independent bits to set
for each concurrent timer firing, we can now have 2^31-1 concurrent
timer firings before running into any issues. Like the fact that each
bit-set was matched with a clear, here we match increments with
decrements to indicate that we're in the "sending on a channel" critical
section in the timer code, so we can report the correct result back on
Stop or Reset.

We choose an Int32 instead of a Uint32 because it's easier to check for
obviously bad values (negative values are always bad) and 2^31-1
concurrent timer firings should be enough for anyone.

Previously, we avoided anything bigger than a Uint8 because we could
pack it into some padding in the runtime.timer struct. But it turns out
that the type that actually matters, runtime.timeTimer, is exactly 96
bytes in size. This means its in the next size class up in the 112 byte
size class because of an allocation header. We thus have some free space
to work with. This change increases the size of this struct from 96
bytes to 104 bytes.

(I'm not sure if runtime.timer is often allocated directly, but if it
is, we get lucky in the same way too. It's exactly 80 bytes in size,
which means its in the 96-byte size class, leaving us with some space to
work with.)

Fixes #69969.
Related to #69880 and #69312.

Change-Id: I9fd59cb6a69365c62971d1f225490a65c58f3e77
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/621616
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621856 mentions this issue: [release-branch.go1.23] runtime,time: use atomic.Int32 for isSending

gopherbot pushed a commit that referenced this issue Oct 25, 2024
This change switches isSending to be an atomic.Int32 instead of an
atomic.Uint8. The Int32 version is managed as a counter, which is
something that we couldn't do with Uint8 without adding a new intrinsic
which may not be available on all architectures.

That is, instead of only being able to support 8 concurrent timer
firings on the same timer because we only have 8 independent bits to set
for each concurrent timer firing, we can now have 2^31-1 concurrent
timer firings before running into any issues. Like the fact that each
bit-set was matched with a clear, here we match increments with
decrements to indicate that we're in the "sending on a channel" critical
section in the timer code, so we can report the correct result back on
Stop or Reset.

We choose an Int32 instead of a Uint32 because it's easier to check for
obviously bad values (negative values are always bad) and 2^31-1
concurrent timer firings should be enough for anyone.

Previously, we avoided anything bigger than a Uint8 because we could
pack it into some padding in the runtime.timer struct. But it turns out
that the type that actually matters, runtime.timeTimer, is exactly 96
bytes in size. This means its in the next size class up in the 112 byte
size class because of an allocation header. We thus have some free space
to work with. This change increases the size of this struct from 96
bytes to 104 bytes.

(I'm not sure if runtime.timer is often allocated directly, but if it
is, we get lucky in the same way too. It's exactly 80 bytes in size,
which means its in the 96-byte size class, leaving us with some space to
work with.)

Fixes #69978
For #69969.
Related to #69880 and #69312 and #69882.

Change-Id: I9fd59cb6a69365c62971d1f225490a65c58f3e77
Cq-Include-Trybots: luci.golang.try:go1.23-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/621616
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit 6a49f81)
Reviewed-on: https://go-review.googlesource.com/c/go/+/621856
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

6 participants