Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: state more explicitly the behavior for buffered channels in the chansend's fast path under extreme conditions. #36203

Closed

Conversation

zhiqiangxu
Copy link
Contributor

@zhiqiangxu zhiqiangxu commented Dec 18, 2019

This information can be useful when implement something like this:

write side:

eventCh := make(chan struct{}, 1) // buffer size of exactly 1, shared between read/write side
// something happened before (X)
// notify via chan
select {
case eventCh <- struct{}{}:
default:
}

read side:

for {
     select {
         case <-eventCh:
         // new event came, handle them (Y)
    }
}

This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full()).

@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Dec 18, 2019
@gopherbot
Copy link
Contributor

This PR (HEAD: f84953a) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/211803 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@zhiqiangxu zhiqiangxu changed the title runtime: state more explicitly the behavior for buffered channels in the fast path under extreme conditions. runtime: state more explicitly the behavior for buffered channels in the chansend's fast path under extreme conditions. Dec 18, 2019
@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 3:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0

Thanks, but I'm talking about the CL description. You say "This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full())." But as far as I can see there is no happens-before relationship from the write side to the read side in the code that you show. And the atomics used by the runtime package, in runtime/internal/atomic, are not the same as the atomics used by user programs, in the sync/atomic package. We shouldn't confuse the two of them.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 3:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0

Thanks, but I'm talking about the CL description. You say "This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full())." But as far as I can see there is no happens-before relationship from the write side to the read side in the code that you show. And the atomics used by the runtime package, in runtime/internal/atomic, are not the same as the atomics used by user programs, in the sync/atomic package. We shouldn't confuse the two of them.

Yes, this is not the officially defined happens-before relationship of go's memory model, it's just the side effect of lock release, but this side effect is not defined anywhere else, so I tried to elaborate on that, as this information can be critical for some use cases:)

About the atomics, it seems sync/atomics are mostly aliases for runtime/internal/atomic:

alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
...


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

Patch Set 3:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0

Thanks, but I'm talking about the CL description. You say "This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full())." But as far as I can see there is no happens-before relationship from the write side to the read side in the code that you show. And the atomics used by the runtime package, in runtime/internal/atomic, are not the same as the atomics used by user programs, in the sync/atomic package. We shouldn't confuse the two of them.

Yes, this is not the officially defined happens-before relationship of go's memory model, it's just the side effect of lock release, but this side effect is not defined anywhere else, so I tried to elaborate on that, as this information can be critical for some use cases:)

About the atomics, it seems sync/atomics are mostly aliases for runtime/internal/atomic:

alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
...

Implementation-wise, yes, but conceptually, no.

The behavior of the language is guaranteed by the spec. It's a category error to say that a comment in the runtime affects how the language behaves. I think it's simply wrong to say "This information can be critical when implement something like this:" People writing Go code should never need to look at comments in the implementation of the runtime package.

Again, the comment change may be fine. It's just the CL description that I am disagreeing with.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0

Thanks, but I'm talking about the CL description. You say "This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full())." But as far as I can see there is no happens-before relationship from the write side to the read side in the code that you show. And the atomics used by the runtime package, in runtime/internal/atomic, are not the same as the atomics used by user programs, in the sync/atomic package. We shouldn't confuse the two of them.

Yes, this is not the officially defined happens-before relationship of go's memory model, it's just the side effect of lock release, but this side effect is not defined anywhere else, so I tried to elaborate on that, as this information can be critical for some use cases:)

About the atomics, it seems sync/atomics are mostly aliases for runtime/internal/atomic:

alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
...

Implementation-wise, yes, but conceptually, no.

The behavior of the language is guaranteed by the spec. It's a category error to say that a comment in the runtime affects how the language behaves. I think it's simply wrong to say "This information can be critical when implement something like this:" People writing Go code should never need to look at comments in the implementation of the runtime package.

Again, the comment change may be fine. It's just the CL description that I am disagreeing with.

I agree, it's best if we can put the side effects of lock release into the spec.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Patch Set 3:

I have no particular objections to the change but I don't understand the description. The Go language doesn't have a notion of "right after" except when using a happens-before relationship, but there doesn't seem to be one here.

This patch only tries to elaborate the side effect of lock release, which can be critical information for some use cases. I think the happens-before relationship as a general language spec still needs more work, especially we should make it explicitly that sync/atomic guarantee "sequential consistency", I have commented on #5045 .

Go doesn't have to bring every c++ feature, but memory-model of c++ seems much clearer: https://riptutorial.com/cplusplus/topic/7975/cplusplus11-memory-model?from=timeline&isappinstalled=0

Thanks, but I'm talking about the CL description. You say "This doc makes it explicit that right after <-eventCh in the read side, the selectnbsend in the write side is guaranteed to succeed(it won't observe a stale full())." But as far as I can see there is no happens-before relationship from the write side to the read side in the code that you show. And the atomics used by the runtime package, in runtime/internal/atomic, are not the same as the atomics used by user programs, in the sync/atomic package. We shouldn't confuse the two of them.

Yes, this is not the officially defined happens-before relationship of go's memory model, it's just the side effect of lock release, but this side effect is not defined anywhere else, so I tried to elaborate on that, as this information can be critical for some use cases:)

About the atomics, it seems sync/atomics are mostly aliases for runtime/internal/atomic:

alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
...

Implementation-wise, yes, but conceptually, no.

The behavior of the language is guaranteed by the spec. It's a category error to say that a comment in the runtime affects how the language behaves. I think it's simply wrong to say "This information can be critical when implement something like this:" People writing Go code should never need to look at comments in the implementation of the runtime package.

Again, the comment change may be fine. It's just the CL description that I am disagreeing with.

I agree, it's best if we can put the side effects of lock release into the spec.

I just changed the word "critical" to "useful" 😊


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

…the fast path under extreme conditions.

This information can be useful when implement something like this:

write side:

eventCh := make(chan struct{}, 1) // buffer size of exactly 1
// something happened before (X)
// notify via chan
select {
case eventCh <- struct{}{}:
default:
}

read side:

for {
     select {
         case <-eventCh:
         // new event comes, handle it
    }
}

This doc makes it explicit that right after `<-eventCh`, the `selectnbsend` in the write side is guaranteed to succeed.
@zhiqiangxu zhiqiangxu force-pushed the fix_doc_for_chansend_fastpath branch from f84953a to 757cf4a Compare December 19, 2019 06:16
@gopherbot
Copy link
Contributor

This PR (HEAD: 757cf4a) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/211803 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5: Code-Review-1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that. This should be what's implied by the original comment about "the side effect of lock release".


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5: Code-Review-1

(1 comment)

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)

Is just "see the changes eventually" too weak? I tend to think the original comment there is telling the unlock will clear memory barrier(bus lock on amd64, something like that on other platforms) so the change is observed immediately. Anyway it's never explicitly mentioned in the spec, and there's an issue about that which has pended for 6 years: #5045


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 5:

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)

Is just "see the changes eventually" too weak? I tend to think the original comment there is telling the unlock will clear memory barrier(bus lock on amd64, something like that on other platforms) so the change is observed immediately. Anyway it's never explicitly mentioned in the spec, and there's an issue about that which has pended for 6 years: #5045

Please don't confuse issue 5045, which is about the user visible behavior of the sync/atomic package, with the use of runtime/internal/atomic in the runtime package. They are not the same, even if the implementations are to some extent the same. For example, runtime/internal/atomic has LoadAcq and StoreRel, but sync/atomic does not.

The spec and the memory model say nothing about the use of atomic operations in the runtime package, and they never will.

Also, when it comes to atomic operations, a statement like "is observed immediately" doesn't mean anything. Different cores run independently with no common clock. There is no meaningful notion of "immediately." As Keith said, this is especially true on non-x86 processors, which have more relaxed semantics for atomic instructions.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)

Is just "see the changes eventually" too weak? I tend to think the original comment there is telling the unlock will clear memory barrier(bus lock on amd64, something like that on other platforms) so the change is observed immediately. Anyway it's never explicitly mentioned in the spec, and there's an issue about that which has pended for 6 years: #5045

Please don't confuse issue 5045, which is about the user visible behavior of the sync/atomic package, with the use of runtime/internal/atomic in the runtime package. They are not the same, even if the implementations are to some extent the same. For example, runtime/internal/atomic has LoadAcq and StoreRel, but sync/atomic does not.

The spec and the memory model say nothing about the use of atomic operations in the runtime package, and they never will.

Also, when it comes to atomic operations, a statement like "is observed immediately" doesn't mean anything. Different cores run independently with no common clock. There is no meaningful notion of "immediately." As Keith said, this is especially true on non-x86 processors, which have more relaxed semantics for atomic instructions.

Isn't LoadAcq and StoreRel related with "observed immediately"? I think the go runtime already met use cases where finer granularity control of memory model than what's in the current spec is useful/needed, so there's LoadAcq and StoreRel in the runtime/internal/atomic, but why aren't they exported to user package, because users may also have similar use cases where this can be useful/needed.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 5:

Patch Set 5:

Patch Set 5:

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)

Is just "see the changes eventually" too weak? I tend to think the original comment there is telling the unlock will clear memory barrier(bus lock on amd64, something like that on other platforms) so the change is observed immediately. Anyway it's never explicitly mentioned in the spec, and there's an issue about that which has pended for 6 years: #5045

Please don't confuse issue 5045, which is about the user visible behavior of the sync/atomic package, with the use of runtime/internal/atomic in the runtime package. They are not the same, even if the implementations are to some extent the same. For example, runtime/internal/atomic has LoadAcq and StoreRel, but sync/atomic does not.

The spec and the memory model say nothing about the use of atomic operations in the runtime package, and they never will.

Also, when it comes to atomic operations, a statement like "is observed immediately" doesn't mean anything. Different cores run independently with no common clock. There is no meaningful notion of "immediately." As Keith said, this is especially true on non-x86 processors, which have more relaxed semantics for atomic instructions.

Isn't LoadAcq and StoreRel related with "observed immediately"? I think the go runtime already met use cases where finer granularity control of memory model than what's in the current spec is useful/needed, so there's LoadAcq and StoreRel in the runtime/internal/atomic, but why aren't they exported to user package, because users may also have similar use cases where this can be useful/needed.

LoadAcq and StoreRel aren't about arbitrary memory operations being observed immediately. If thread A does a StoreRel and then thread B does a LoadAcq, thread B will see all memory stores made by thread A before the StoreRel. A different thread C that has not done any atomic operations on that memory location may see some, all, or none of the changes made by thread A.


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from 志强 徐:

Patch Set 5:

Patch Set 5:

Patch Set 5:

Patch Set 5:

If that goroutine decrements qcount, but the lock hasn't been released, then this goroutine might see the stale full(), but if that goroutine has released lock, then this goroutine is guaranteed to observe that.

I don't agree. Without synchronization, the reading goroutine can see a stale full() result even after the receiving goroutine has decremented qcount and released the lock.

This should be what's implied by the original comment about "the side effect of lock release".

I think that's guaranteed by the side effect of lock instruction inside Unlock, which clears memory barrier.

runtime.unlock calls runtime/internal/atomic.Xchg (assuming linux here, in runtime/lock_futex.go). That Xchg does a XCHGL instruction, which does do a bus lock on amd64 (even though it doesn't have an explicit LOCK prefix). But that's not enough to guarantee anything. You would need some sort of atomic on the chansend side, not just one on the chanrecv side. And that's only for amd64, other architectures have different guarantees on runtime.unlock. The only guarantee we want to depend on is that any writes before a runtime.unlock are guaranteed to be seen by another goroutine that does a subsequent runtime.lock.

The read side doesn't need to do anything special to observe the mutation, in fact what atomic.Read only does is just to tell the compiler to not reorder.

That's true on amd64. There are lots of architectures for which that isn't true.

Then what's the "side effect of lock release" on those architectures?

I think it relates to the forward progress claim, in that the unlock in the receiving goroutine will cause the sending goroutine to see the changes eventually. (I think every architecture we deal with will ensure that a reader eventually sees a write, regardless of synchronization. Only the compiler might be able to delay observing a write indefinitely.)

Is just "see the changes eventually" too weak? I tend to think the original comment there is telling the unlock will clear memory barrier(bus lock on amd64, something like that on other platforms) so the change is observed immediately. Anyway it's never explicitly mentioned in the spec, and there's an issue about that which has pended for 6 years: #5045

Please don't confuse issue 5045, which is about the user visible behavior of the sync/atomic package, with the use of runtime/internal/atomic in the runtime package. They are not the same, even if the implementations are to some extent the same. For example, runtime/internal/atomic has LoadAcq and StoreRel, but sync/atomic does not.

The spec and the memory model say nothing about the use of atomic operations in the runtime package, and they never will.

Also, when it comes to atomic operations, a statement like "is observed immediately" doesn't mean anything. Different cores run independently with no common clock. There is no meaningful notion of "immediately." As Keith said, this is especially true on non-x86 processors, which have more relaxed semantics for atomic instructions.

Isn't LoadAcq and StoreRel related with "observed immediately"? I think the go runtime already met use cases where finer granularity control of memory model than what's in the current spec is useful/needed, so there's LoadAcq and StoreRel in the runtime/internal/atomic, but why aren't they exported to user package, because users may also have similar use cases where this can be useful/needed.

LoadAcq and StoreRel aren't about arbitrary memory operations being observed immediately. If thread A does a StoreRel and then thread B does a LoadAcq, thread B will see all memory stores made by thread A before the StoreRel. A different thread C that has not done any atomic operations on that memory location may see some, all, or none of the changes made by thread A.

The wording may not be accurate, but the ideas are the same. And these are good things to have in user code, not just in runtime internally:)


Please don’t reply on this GitHub thread. Visit golang.org/cl/211803.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants