proposal: cacheline-diverse allocation #19025

bcmills · 2017-02-10T18:20:21Z

For certain applications (e.g. most applications of #18802, some possible implementations for #18177, and basically anything that farms out work to multiple goroutines in parallel), it is useful to be able to allocate a set of values with the constraint that they occupy disjoint cache lines.

For example, one might want to allocate runtime.GOMAXPROCS(0) worker goroutines each encoding to a bytes.Buffer. In that case, we want to ensure that none of those buffers induce false-sharing across caches.

I can think of two fairly simple APIs for this: one using finalizers and maps and supporting a resizable set of values, and another simpler API using slices but supporting only a fixed set of allocations.

package cacheline

type Allocator …

// New returns a new value of the given pointer or channel type that does not refer to
// any of the same cache lines as any live values previously returned by the same
// Allocator.
func(a *Allocator) New(t reflect.Type) interface{} {
  …
}

or

package cacheline

// Allocate fills the passed-in slice of pointers or channels with values that refer to
// disjoint cache lines.
func Allocate(slice interface{}) {
  …
}

I have an implementation of the former using a map[uintptr]bool to record the first and last cache-line for each value allocated (and setting finalizers to retire entries in the map). It resolves collisions by keeping the colliding values live in a slice and allocating in a loop until it gets a non-collision; with runtime support, it could avoid that overallocation entirely.

The latter API is trivial to implement in terms of the former, but could be more memory-efficient with a standalone implementation (wouldn't need to store the collision map over the long term).

With runtime support, it would be possible to extend this API to resizable built-in types (maps and slices). I do not see how to make that extension without the cooperation of the runtime.

The text was updated successfully, but these errors were encountered:

rsc · 2017-02-13T20:34:49Z

We've been getting along OK by just making things >=1 cache line in size. It would help here to have concrete examples where this more general operation is necessary.

/cc @aclements @RLH @dr2chase @randall77 @ianlancetaylor

randall77 · 2017-02-13T20:49:01Z

A _ [sys.CacheLineSize]byte field is a pretty concise way of requesting the allocator to avoid shared cache lines for a struct.
It wastes a bit of memory, but not a lot, and memory that probably needs to be wasted regardless.
We could export CacheLineSize, I guess.

rsc · 2017-02-27T22:25:14Z

What I wrote two weeks ago is still true:

We've been getting along OK by just making things >=1 cache line in size. It would help here to have concrete examples where this more general operation is necessary.

@bcmills, do you have data showing the need here? If not, I would suggest we close this issue and leave it for when a real need arises, so that we can do a more accurate cost-benefit analysis. What's here today seems to be all costs (new, complex, subtle API) and no benefits.

It's also not clear that a general solution is needed. As Keith said, we already have a strategy for avoiding shared cache lines in a struct. Someone who wants N bytes.Buffers can already do

type uncachedBuffer struct {
    bytes.Buffer
    pad [cacheLineSize]byte
}

and allocate an array of those. For the specific case of bytes.Buffer, the tmp [64]byte may end up serving well enough as padding anyway, assuming these goroutines are writing enough to overflow into new memory (maybe we should move lastRead up above tmp to be more cache-friendly).

Any large enough allocation is cache-line-aligned already, so most map buckets are already in separate cache lines (for example a map[*T]*U bucket is 8*(8+8) = 128 bytes on amd64, where we believe a cache line is 64 bytes). A map header is 48 bytes; we could possibly just add two words to every header if we knew there were important performance wins.

Similarly, a channel header is 96 bytes, so 1.5 cache lines. We could expand it to 2 cache lines if that had important benefits.

All of these possible changes could be done directly, given a clearly demonstrated need, without new API.

bcmills · 2017-02-27T22:28:17Z

I had thought that I had a good use-case, but when I dug into the details it turned out not to be necessary for the moment. I'm fine with closing this for now (and reopening if/when there is more concrete data supporting a need).

rsc added this to the Proposal milestone Feb 13, 2017

rsc added the Proposal label Feb 13, 2017

mdempsky mentioned this issue Feb 13, 2017

proposal: runtime: add AlignedN types that can be used to increase alignment #19057

Open

bcmills closed this as completed Feb 27, 2017

golang locked and limited conversation to collaborators Feb 27, 2018

gopherbot added the FrozenDueToAge label Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: cacheline-diverse allocation #19025

proposal: cacheline-diverse allocation #19025

bcmills commented Feb 10, 2017

rsc commented Feb 13, 2017

randall77 commented Feb 13, 2017

rsc commented Feb 27, 2017 •

edited

Loading

bcmills commented Feb 27, 2017

proposal: cacheline-diverse allocation #19025

proposal: cacheline-diverse allocation #19025

Comments

bcmills commented Feb 10, 2017

rsc commented Feb 13, 2017

randall77 commented Feb 13, 2017

rsc commented Feb 27, 2017 • edited Loading

bcmills commented Feb 27, 2017

rsc commented Feb 27, 2017 •

edited

Loading