Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cacheline-diverse allocation #19025

Closed
bcmills opened this issue Feb 10, 2017 · 4 comments
Closed

proposal: cacheline-diverse allocation #19025

bcmills opened this issue Feb 10, 2017 · 4 comments

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2017

For certain applications (e.g. most applications of #18802, some possible implementations for #18177, and basically anything that farms out work to multiple goroutines in parallel), it is useful to be able to allocate a set of values with the constraint that they occupy disjoint cache lines.

For example, one might want to allocate runtime.GOMAXPROCS(0) worker goroutines each encoding to a bytes.Buffer. In that case, we want to ensure that none of those buffers induce false-sharing across caches.

I can think of two fairly simple APIs for this: one using finalizers and maps and supporting a resizable set of values, and another simpler API using slices but supporting only a fixed set of allocations.


package cacheline

type Allocator// New returns a new value of the given pointer or channel type that does not refer to
// any of the same cache lines as any live values previously returned by the same
// Allocator.
func(a *Allocator) New(t reflect.Type) interface{} {
  …
}

or

package cacheline

// Allocate fills the passed-in slice of pointers or channels with values that refer to
// disjoint cache lines.
func Allocate(slice interface{}) {
  …
}

I have an implementation of the former using a map[uintptr]bool to record the first and last cache-line for each value allocated (and setting finalizers to retire entries in the map). It resolves collisions by keeping the colliding values live in a slice and allocating in a loop until it gets a non-collision; with runtime support, it could avoid that overallocation entirely.

The latter API is trivial to implement in terms of the former, but could be more memory-efficient with a standalone implementation (wouldn't need to store the collision map over the long term).

With runtime support, it would be possible to extend this API to resizable built-in types (maps and slices). I do not see how to make that extension without the cooperation of the runtime.

@rsc
Copy link
Contributor

rsc commented Feb 13, 2017

We've been getting along OK by just making things >=1 cache line in size. It would help here to have concrete examples where this more general operation is necessary.

/cc @aclements @RLH @dr2chase @randall77 @ianlancetaylor

@rsc rsc added this to the Proposal milestone Feb 13, 2017
@rsc rsc added the Proposal label Feb 13, 2017
@randall77
Copy link
Contributor

A _ [sys.CacheLineSize]byte field is a pretty concise way of requesting the allocator to avoid shared cache lines for a struct.
It wastes a bit of memory, but not a lot, and memory that probably needs to be wasted regardless.
We could export CacheLineSize, I guess.

@rsc
Copy link
Contributor

rsc commented Feb 27, 2017

What I wrote two weeks ago is still true:

We've been getting along OK by just making things >=1 cache line in size. It would help here to have concrete examples where this more general operation is necessary.

@bcmills, do you have data showing the need here? If not, I would suggest we close this issue and leave it for when a real need arises, so that we can do a more accurate cost-benefit analysis. What's here today seems to be all costs (new, complex, subtle API) and no benefits.

It's also not clear that a general solution is needed. As Keith said, we already have a strategy for avoiding shared cache lines in a struct. Someone who wants N bytes.Buffers can already do

type uncachedBuffer struct {
    bytes.Buffer
    pad [cacheLineSize]byte
}

and allocate an array of those. For the specific case of bytes.Buffer, the tmp [64]byte may end up serving well enough as padding anyway, assuming these goroutines are writing enough to overflow into new memory (maybe we should move lastRead up above tmp to be more cache-friendly).

Any large enough allocation is cache-line-aligned already, so most map buckets are already in separate cache lines (for example a map[*T]*U bucket is 8*(8+8) = 128 bytes on amd64, where we believe a cache line is 64 bytes). A map header is 48 bytes; we could possibly just add two words to every header if we knew there were important performance wins.

Similarly, a channel header is 96 bytes, so 1.5 cache lines. We could expand it to 2 cache lines if that had important benefits.

All of these possible changes could be done directly, given a clearly demonstrated need, without new API.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 27, 2017

I had thought that I had a good use-case, but when I dug into the details it turned out not to be necessary for the moment. I'm fine with closing this for now (and reopening if/when there is more concrete data supporting a need).

@bcmills bcmills closed this as completed Feb 27, 2017
@golang golang locked and limited conversation to collaborators Feb 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants