Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

What is the purpose of ordering the updating of the A/D bits by a FENCE. #15

Closed
kdockser opened this issue Aug 22, 2023 · 29 comments
Closed

Comments

@kdockser
Copy link

kdockser commented Aug 22, 2023

The svadu spec states:
The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts.

Unfortunately, this removes the ability to speculatively update the A bit. For example, a FENCE may be jumped over by a mispredicted branch and a load could have speculatively occurred. The FENCE will not be seen until the processor redirects, well after the "A" bit had been speculatively set. Either the processor needs a means to roll back the setting of that A bit (this would be nasty), or it cannot speculatively set any A bits.

What is the value of ordering the update of the A and D bits by the FENCE? It is worth the loss of speculatively performing page table walks?

@ved-rivos
Copy link
Collaborator

This does not prevent speculative updates. The A bit can be updated speculatively and when a FENCE is observed any A bit updates performed prior to the FENCE are now in the predecessor set of the FENCE.

@kdockser
Copy link
Author

If an access is after the FENCE, the A bit cannot be updated until the FENCE completes according this Svadu requirement. In the case that I was talking about, the FENCE is speculatively (and incorrectly) branched around. On redirect, the FENCE is discovered but there is no way to roll back the updating of the speculative A bit - this results in a violation of this requirement.

How can A be speculatively updated if such a situation can occur?
What was the purpose of putting this restriction on the updating of the A/D bits?

start:
       Store (X)
       Branch ..., after_fence:  # mispredicted taken
       Fence
after_fence:
       Load (Y)

@ved-rivos
Copy link
Collaborator

I assume the fence orders the store (X) and the load (Y). The text in the spec states that the A and/or D bit update must be observed before the explicit access that caused the A and/or D bit update is observed. If Load(Y) was non-idempotent then it would not have occurred speculatively. If it was to idempotent memory then the load can occur but if that load caused an A bit update then the A bit update must be observed along with the explicit speculative access made by that load - the explicit memory access may be speculative if the PMA allows it to be. The fence should order the A and/or D bit update caused by the store wrt to A bit update that may be caused by the Load and the explicit access caused by subsequent the load.
So I dont see that rule as prohibiting speculative update of A bit. The speculative update of A bit is explicitly allowed by the 3rd paragraph.
@aswaterman @gfavor - Please correct if my interpretation is wrong.

@kdockser
Copy link
Author

kdockser commented Aug 23, 2023

Thanks Ved.
The concern in this scenario is the branch circumvents the Fence and allows the Load(Y) to proceed speculatively - including the speculative update of the A bit in each of the G-stage PTE leaves. When the hart realizes that the branch was mispredicted, it rolls back the Load(Y) but is unable to roll back the update of the associated A bit. It is now possible for the A bit from Load(Y) to be observable before the STORE(X) and its PTE bits - a violation of the FENCE.

The most straightforward way to avoid violating the FENCE is to prohibit such speculative updating of the A bit.

@ved-rivos
Copy link
Collaborator

ved-rivos commented Aug 23, 2023

I do not read "The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts." as disallowing speculative A bit update. I interpret "also orders PTE updates associated with those loads" as "if those loads would cause a PTE update then those PTE updates must be ordered". If a predictor warmed the TLB or a speculative execution filled the TLB then those load would not cause an associated A bit update - it is something that has already happened and there is no "PTE update associated with those loads."

@ved-rivos
Copy link
Collaborator

I hope that addresses the concern.

@gfavor
Copy link

gfavor commented Sep 3, 2023

Just to clarify what Ken is getting at and what is the implication of the Svadu text (and I generalize the matter) ...

In general, mis-speculated accesses must ultimately either be "undone" or held off from being performed speculatively. As a result, any such accesses coming before or after a fence have no significance wrt the fence.

But what about speculative A-bit updates by mis-speculated accesses? Must they also be undone? If not, then in Ken's example, the load's A-bit update will have occurred and remained in place after the branch misprediction is recognized and code execution now continues through the fence, i.e. while the load (the second time around) is properly ordered wrt the fence, the A-bit update (caused by the load the first time around) will effectively have occurred before the fence (and hence possibly before the store and before its PTE updates).

Now if the spec A-bit update must also be undone due to the load mis-speculation (the first time around), then this has serious implementation impacts, e.g. either the spec A-bit updates by spec loads can only be done once the load is determined that it is guaranteed to eventually commit and not be aborted away, or the spec A-bit updates must also be buffered - like stores - so that they can be thrown away if and when the load is aborted away.

So, the question to clarify is whether speculative A-bit updates by mis-speculated accesses must also be undone once the explicit access mis-speculation is discovered? Ved's responses seem to suggest that the answer is Yes - which leads into Ken's implementation concerns (and the likely implementation that spec A-bit updates become severely constrained as to when they can be allowed to occur).

I'll leave this issue closed, but a clear yes or no would be good. And maybe an extra non-normative sentence in the spec that says, for example, that "speculative A-bit updates by mis-speculated accesses must also appear to not have been performed after the explicit access mis-speculation is resolved" (if that is the arch intent).

Cc'ing @aswaterman to make sure he is aware of and agrees with whatever answer Ved provides as to the arch intent of Svadu.

@ved-rivos
Copy link
Collaborator

So, the question to clarify is whether speculative A-bit updates by mis-speculated accesses must also be undone once the explicit access mis-speculation is discovered? Ved's responses seem to suggest that the answer is Yes

My answer was No. The architecture allows performing speculative updates A-bit. The A-bit may be the result of bad speculation or it may be due to a prefetcher/predictor that cached these PTEs into the address translation caches.

For the purpose of a fence, these A-bit updates occurred in the past and is thus not "associated with the load or store" that the fence is ordering. There is no reason to undo the update of the A-bit.

@gfavor
Copy link

gfavor commented Sep 3, 2023

Thanks. That's the "right" answer. :) Now the only question is whether a non-normative sentence like the following is warranted to address the kind of thoughts that others, besides Ken, may have:

A speculative A-bit update by a mis-speculated access is allowed to be performed even while the associated access is architecturally never performed.

My own leaning, at the expense of one sentence, would be to include something like this to make it clear for people that may struggle in properly reading and interpreting what the arch spec says. And, for that matter, this statement makes clear just how speculative A-bit updates can be - which is unlike all the rest of the architecture that allows stores (aka updates to memory state) to be architecturally performed only if the instruction causing the store is truly part of the committed program execution. (Having just said that, I now lean more strongly towards wanting to see that extra sentence added to the spec.)

Again cc'ing @aswaterman to make sure he either agrees or disagrees with the view I'm taking. If both of you agree that this sentence should not be added, then I'm ok with that.

@ved-rivos
Copy link
Collaborator

I agree incorporating a sentence like the one mentioned would be beneficial.

Additionally, I'd like to emphasize that "observing a load by a remote hart" for idempotent memory might not be a simple task either. Prefetching data is a common occurrence. When a load accesses data, the observance of that load by another hart — whether through monitoring cache line states, snoops, or other means — may not always be deterministic. A remote hart might not be able to infer whether the load it observed resulted from speculative execution, a prefetch, or if the load was carried out in a non-speculative manner.

@gfavor
Copy link

gfavor commented Sep 4, 2023

Btw, this issue should be reopened until the latest Svadu updates are made (per the last pair of posts by me and Ved).

@ved-rivos ved-rivos reopened this Sep 4, 2023
@kdockser
Copy link
Author

kdockser commented Sep 5, 2023

Thanks Greg and Ved.

As Greg alluded to, there are two kinds of speculation: architectural and microarchitectural. It is easy for a reader to conflate the two and combine the microarchitectural speculative load (for example) and the architectural speculative address translation. Microarchitectural speculative execution needs to act as if it follows the architecture; this means that it may need to roll back any mis-speculated behavior. However, the architectural speculation allows for architecturally visible changes without ever needing to roll back. This architectural speculation allows the A bit to be set on an address translation that is performed in anticipation of a potential subsequent access. It also allows the D bit to be set (in a G-stage PTE) in response to the setting of an A bit (in a VS-stage PTE).

It would add clarity if we changed:
Updates of the A bit may be performed as a result of speculation, but updates to the D bit must be exact (i.e., non-speculative), and observed in program order by the local hart. When two-stage address translation is active, updates of the D bit in G-stage PTEs may be performed as a result of speculative updates of the A bit in VS-stage PTEs.
to:
The A bit may be architecturally set as a result of an address translation that is performed in anticipation of a subsequent access (which might never occur). The D bit may be set by an architecturally executed instruction, in which case it must be observed in program order by the local hart. When two-stage address translation is active, the D bit in G-stage PTEs may be set as a result of setting the A bit in VS-stage PTEs.

The FENCE is not there to prevent the early setting of an A bit associated with an explicit access that is after the FENCE. It is there to ensure (amongst other things) that all required PTE updates for explicit accesses prior to the FENCE have occurred. However, since the PTE updates are already mandated to be observed before the explicit access, a FENCE that enforces these explicit accesses already effectively ensures that the PTE bits will already be visible. Therefore, it seems like this comment about the FENCE should be a non-normative note as it is not providing any new requirements.

@gfavor
Copy link

gfavor commented Sep 6, 2023

How about the following that captures the idea needing clarification while staying closer to a trimmed down version of the existing text:

Updates to the A bit may be performed speculatively, even if the associated memory accesses ultimately are not performed architecturally. Updates to the D bit must only be done non-speculatively and must observed in program order by the local hart. During a two-stage address translation, updates to the D bit in G-stage PTEs may be performed as a result of speculative updates of the A bit in VS-stage PTEs.

@gfavor
Copy link

gfavor commented Sep 6, 2023

Regarding Ken's FENCE comment, it does seem like it's a simple matter of the fence ordering the surrounding explicit accesses and the rest follows, i.e. A/D bits updates by predecessor accesses are also ordered before the fence, A bit updates by successor accesses are not ordered by the fence, and D bit updates by successor accesses are also not ordered by the fence but still must maintain program ordering wrt all surrounding D bit updates. (Note that doing non-speculative D-bit updates by successor accesses doesn't mean that these D-bit updates are guaranteed to be ordered after all predecessor accesses. This is for the same reason that (under RVWMO) a series of non-spec explicit memory accesses to different addresses can go out to the ordering point out-of-order and be globally ordered o-o-o wrt each other.)

So it seems like it would be just a non-normative comment "reminding" people that A/D bit updates due to memory accesses ordered-after the fence are not themselves ordered-after the fence (and D-bit updates only maintain program order wrt surrounding D-bit updates).

@ved-rivos
Copy link
Collaborator

ved-rivos commented Sep 6, 2023

The proposed update looks good to me. I have created PR #18 with the suggested updates. Please review.

@gfavor
Copy link

gfavor commented Sep 6, 2023

Isn't the combination of PR #17 and PR #18 what needs to be reviewed (i.e. you're post only refers to #18)? I looked at both and together they look good and seem to cover everything.

@ved-rivos
Copy link
Collaborator

I withdrew PR #17 since the merge was mixed up for that. PR #18 should be the complete PR.

@gfavor
Copy link

gfavor commented Sep 6, 2023 via email

@ved-rivos
Copy link
Collaborator

Please ignore PR #17 as it had errors and is rejected. The PR #18 is the one addressing this issue.

@gfavor
Copy link

gfavor commented Sep 6, 2023

So are the control bit and two-stage translation changes in the rejected PR #17, part of another PR already going into the spec?

@ved-rivos
Copy link
Collaborator

Please completely ignore #17 - that is a broken PR and has been rejected. The only delta related to the document undergoing public review is in PR #18

@gfavor
Copy link

gfavor commented Sep 6, 2023

So just to confirm, the control bit and two-stage translation chnages are already taken care of elsewhere?

@ved-rivos
Copy link
Collaborator

You may be referring to PR #13 that was the last PR applied before freeze. This had the update to refer to the address translation sections by name instead of number and the renaming of the control bits from HADE to ADUE.

@ved-rivos
Copy link
Collaborator

@gfavor - hope that addressed the question.
@kdockser - does the PR look good to you?

@gfavor
Copy link

gfavor commented Sep 8, 2023

Yes.

@kdockser
Copy link
Author

kdockser commented Sep 8, 2023

The changes look better. However, the updating of the D bit needs clarification. In one sentence it says
updates to the D bit must be exact (i.e., non-speculative), and observed in program order by the local hart.
However, the next sentence says
When two-stage address translation is active, updates of the D bit in G-stage PTEs may be performed as a result of speculative updates of the A bit in VS-stage PTEs.
This seems contradictory as the second sentence implies that the D bit can be updated as a part of a "speculative" access in response to the A bit being set. But, the first sentence says that D updates must be non-speculative and observed in program order by the local hart. This means that the D bit update in response to the A bit update must be ordered after the fence. I think we need something along the lines of:

The D bit of a (V)S-stage PTE may be set by an architecturally executed store-type instruction, in which case it must be observed in program order by the local hart. When two-stage address translation is active, the D bit in G-stage PTEs may be set as a result of setting the A bit in VS-stage PTEs.

Such wording separates out the D bit setting in a VS-stage PTE --- which must be exact, from the D bit setting in a G-stage PTE --- which does not need to be exact.

Also the following still has some ambiguity:
The PTE updates due to memory accesses ordered-after a FENCE are not themselves ordered-after the FENCE.
A slight change removes this ambiguity:
The PTE updates due to memory accesses ordered-after a FENCE are not themselves ordered by the FENCE.

@ved-rivos
Copy link
Collaborator

Thank you, Ken. I understand the source of confusion. When referencing a PTE - whether it's S/VS or G - the D bit must be set precisely when the outcome is from an explicit store. However, for G-stage PTEs, the D bit can be speculatively set due to an implicit store triggered by setting the A bit in the VS-stage PTEs. To clarify this nuance, I suggest the following revision: "Updates to the D bit , resulting from an explicit store, must be exact (i.e., non-speculative)."

Kindly review the updated PR for further details.

@ved-rivos
Copy link
Collaborator

@kdockser - hope this update addressed the concern.

@kdockser
Copy link
Author

Thanks Ved. The changes add a lot of clarity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants