Implement volatile barrier APIs #107843

hamarb123 · 2024-09-15T23:37:39Z

This implements the proposed Read-ReadWrite and ReadWrite-Write barriers. Note: I haven't implemented any tests yet.

/cc @jkotas @VSadov @kouvel

Now that I'm on the correct branch

dotnet-issue-labeler · 2024-09-15T23:37:48Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2024-09-15T23:37:49Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

VSadov · 2024-09-16T01:06:43Z

my understanding of the x86 memory model is that it allows reads to be reordered after writes (when different addresses) (which is not supposed to be allowed across a Volatile.WriteBarrier with the ReadWrite-Write model if I'm understanding correctly)

reads are not reordered after writes in x86.
reads can happen earlier than preceding writes (i.e. prefetch), and to prevent that you'd indeed need a fill fence, but that is not something that WriteBarrier needs to guarantee.

In short - WriteBarrier needs to wait for reads/writes in progress to complete before allowing more writes.

on x86/x64 Volatile.WriteBarrier is just a compiler fence, similar to Volatile.ReadBarrier.
on arm it is a full fence (dmb[ish]). Sadly, dmb.st does not wait for reads.

hamarb123 · 2024-09-16T01:32:54Z

Hmm, I still don't understand. What's the difference between:

...x
Volatile.ReadBarrier(); //emits nothing
Volatile.WriteBarrier(); //emits nothing
...y

And

...x
Interlocked.MemoryBarrier(); //emits lock ...
...y

@VSadov can you give me an example of some x & y where the behaviour is allowed to be different so I can see an example of what I'm not understanding?

Edit: I think I understand now (leaving this here for my future reference)

read a
write b
Volatile.ReadBarrier(); Volatile.WriteBarrier(); //or swapped
read c
write d

could be re-ordered to: read a, read c, write b, write d, whereas Interlocked.MemoryBarrier(); would stop this re-ordering obviously.

- And fix missed file from jit-format

hamarb123 · 2024-09-16T02:37:26Z

on arm it is a full fence (dmb[ish]). Sadly, dmb.st does not wait for reads.

Would dmb ishst + dmb ishld be enough? (idk if this is actually faster anyway, but maybe it is?)
It would give Load-Load, Load-Store, and Store-Store guarantees according to this.

Using my example from earlier: read a, write b, barrier/s, read c, write d (where these represent arbitrary quantities of reads & writes in any order):

Volatile.WriteBarrier requires: a,b before d
dmb ishst gives b before d
dmb ishld gives a before c,d

So it would seem to me as though dmb ishst + dmb ishld (in either order) should theoretically be enough. Whether it's faster than just a dmb ish is another question obviously (one that is only really relevant if it's a valid approach anyway). If there's something wrong with my analysis, please let me know :)

VSadov · 2024-09-16T03:56:12Z

Would dmb ishst + dmb ishld be enough? (idk if this is actually faster anyway, but maybe it is?)

At first glance it seems that the combination is as good as a full barrier.
If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

hamarb123 · 2024-09-16T03:58:18Z

At first glance it seems that the combination is as good as a full barrier. If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

It's actually not as strong as a full barrier, since it doesn't give b before c, which is the same thing that x86 doesn't give by default I think based on what you were saying.

VSadov · 2024-09-16T04:22:34Z

At first glance it seems that the combination is as good as a full barrier. If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

It's actually not as strong as a full barrier, since it doesn't give b before c, which is the same thing that x86 doesn't give by default I think based on what you were saying.

Ah, right, it still does not order Store-Load. It could be cheaper then, since it guarantees less.

hamarb123 · 2024-09-19T11:04:54Z

Ah, right, it still does not order Store-Load. It could be cheaper then, since it guarantees less.

I did some testing based on code I gave in the use case section of my api proposal issue (converted to C++) on a M-series macbook and got about a 1.4% regression overall (don't interpret that as the pair is exactly 1.4% slower than just dmb ish as there is obviously other code around the dmb instructions, it's just most likely not faster I think) (testing ran for about 25 mins total), so probably no point pursuing this idea further. Not overly surprised, but anyway.

- And fix up some FIXMEs & comments re cpobj and cpblk

hamarb123 added 2 commits September 16, 2024 08:49

Initial commit

aee00b5

Follow-up commit

892d7cd

Now that I'm on the correct branch

hamarb123 requested review from lambdageek and steveisok as code owners September 15, 2024 23:37

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Sep 15, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 15, 2024

jit-format

5fc6d75

Implement Feedback

0b5d490

- And fix missed file from jit-format

build-analysis bot mentioned this pull request Sep 16, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

This was referenced Sep 16, 2024

[browser] Unable to evaluate script: tab crashed #103623

Open

SpawnBrowserAsync #107771

Open

hamarb123 added 2 commits September 19, 2024 21:20

Fix typos & use appropriate barrier type in mono

9a103d1

Use optimised dmb on arm64 where possible on mini-mono runtime

9cf3fd6

- And fix up some FIXMEs & comments re cpobj and cpblk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement volatile barrier APIs #107843

Implement volatile barrier APIs #107843

hamarb123 commented Sep 15, 2024 •

edited

Loading

dotnet-issue-labeler bot commented Sep 15, 2024

dotnet-issue-labeler bot commented Sep 15, 2024

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024

hamarb123 commented Sep 19, 2024 •

edited

Loading

Implement volatile barrier APIs #107843

Are you sure you want to change the base?

Implement volatile barrier APIs #107843

Conversation

hamarb123 commented Sep 15, 2024 • edited Loading

dotnet-issue-labeler bot commented Sep 15, 2024

dotnet-issue-labeler bot commented Sep 15, 2024

VSadov commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

VSadov commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

VSadov commented Sep 16, 2024

hamarb123 commented Sep 19, 2024 • edited Loading

hamarb123 commented Sep 15, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 19, 2024 •

edited

Loading