-
Notifications
You must be signed in to change notification settings - Fork 2
What is the purpose of ordering the updating of the A/D bits by a FENCE. #15
Comments
This does not prevent speculative updates. The A bit can be updated speculatively and when a FENCE is observed any A bit updates performed prior to the FENCE are now in the predecessor set of the FENCE. |
If an access is after the FENCE, the A bit cannot be updated until the FENCE completes according this Svadu requirement. In the case that I was talking about, the FENCE is speculatively (and incorrectly) branched around. On redirect, the FENCE is discovered but there is no way to roll back the updating of the speculative A bit - this results in a violation of this requirement. How can A be speculatively updated if such a situation can occur?
|
I assume the fence orders the store (X) and the load (Y). The text in the spec states that the A and/or D bit update must be observed before the explicit access that caused the A and/or D bit update is observed. If Load(Y) was non-idempotent then it would not have occurred speculatively. If it was to idempotent memory then the load can occur but if that load caused an A bit update then the A bit update must be observed along with the explicit speculative access made by that load - the explicit memory access may be speculative if the PMA allows it to be. The fence should order the A and/or D bit update caused by the store wrt to A bit update that may be caused by the Load and the explicit access caused by subsequent the load. |
Thanks Ved. The most straightforward way to avoid violating the FENCE is to prohibit such speculative updating of the A bit. |
I do not read "The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts." as disallowing speculative A bit update. I interpret "also orders PTE updates associated with those loads" as "if those loads would cause a PTE update then those PTE updates must be ordered". If a predictor warmed the TLB or a speculative execution filled the TLB then those load would not cause an associated A bit update - it is something that has already happened and there is no "PTE update associated with those loads." |
I hope that addresses the concern. |
Just to clarify what Ken is getting at and what is the implication of the Svadu text (and I generalize the matter) ... In general, mis-speculated accesses must ultimately either be "undone" or held off from being performed speculatively. As a result, any such accesses coming before or after a fence have no significance wrt the fence. But what about speculative A-bit updates by mis-speculated accesses? Must they also be undone? If not, then in Ken's example, the load's A-bit update will have occurred and remained in place after the branch misprediction is recognized and code execution now continues through the fence, i.e. while the load (the second time around) is properly ordered wrt the fence, the A-bit update (caused by the load the first time around) will effectively have occurred before the fence (and hence possibly before the store and before its PTE updates). Now if the spec A-bit update must also be undone due to the load mis-speculation (the first time around), then this has serious implementation impacts, e.g. either the spec A-bit updates by spec loads can only be done once the load is determined that it is guaranteed to eventually commit and not be aborted away, or the spec A-bit updates must also be buffered - like stores - so that they can be thrown away if and when the load is aborted away. So, the question to clarify is whether speculative A-bit updates by mis-speculated accesses must also be undone once the explicit access mis-speculation is discovered? Ved's responses seem to suggest that the answer is Yes - which leads into Ken's implementation concerns (and the likely implementation that spec A-bit updates become severely constrained as to when they can be allowed to occur). I'll leave this issue closed, but a clear yes or no would be good. And maybe an extra non-normative sentence in the spec that says, for example, that "speculative A-bit updates by mis-speculated accesses must also appear to not have been performed after the explicit access mis-speculation is resolved" (if that is the arch intent). Cc'ing @aswaterman to make sure he is aware of and agrees with whatever answer Ved provides as to the arch intent of Svadu. |
My answer was No. The architecture allows performing speculative updates A-bit. The A-bit may be the result of bad speculation or it may be due to a prefetcher/predictor that cached these PTEs into the address translation caches. For the purpose of a fence, these A-bit updates occurred in the past and is thus not "associated with the load or store" that the fence is ordering. There is no reason to undo the update of the A-bit. |
Thanks. That's the "right" answer. :) Now the only question is whether a non-normative sentence like the following is warranted to address the kind of thoughts that others, besides Ken, may have: A speculative A-bit update by a mis-speculated access is allowed to be performed even while the associated access is architecturally never performed. My own leaning, at the expense of one sentence, would be to include something like this to make it clear for people that may struggle in properly reading and interpreting what the arch spec says. And, for that matter, this statement makes clear just how speculative A-bit updates can be - which is unlike all the rest of the architecture that allows stores (aka updates to memory state) to be architecturally performed only if the instruction causing the store is truly part of the committed program execution. (Having just said that, I now lean more strongly towards wanting to see that extra sentence added to the spec.) Again cc'ing @aswaterman to make sure he either agrees or disagrees with the view I'm taking. If both of you agree that this sentence should not be added, then I'm ok with that. |
I agree incorporating a sentence like the one mentioned would be beneficial. Additionally, I'd like to emphasize that "observing a load by a remote hart" for idempotent memory might not be a simple task either. Prefetching data is a common occurrence. When a load accesses data, the observance of that load by another hart — whether through monitoring cache line states, snoops, or other means — may not always be deterministic. A remote hart might not be able to infer whether the load it observed resulted from speculative execution, a prefetch, or if the load was carried out in a non-speculative manner. |
Btw, this issue should be reopened until the latest Svadu updates are made (per the last pair of posts by me and Ved). |
Thanks Greg and Ved. As Greg alluded to, there are two kinds of speculation: architectural and microarchitectural. It is easy for a reader to conflate the two and combine the microarchitectural speculative load (for example) and the architectural speculative address translation. Microarchitectural speculative execution needs to act as if it follows the architecture; this means that it may need to roll back any mis-speculated behavior. However, the architectural speculation allows for architecturally visible changes without ever needing to roll back. This architectural speculation allows the A bit to be set on an address translation that is performed in anticipation of a potential subsequent access. It also allows the D bit to be set (in a G-stage PTE) in response to the setting of an A bit (in a VS-stage PTE). It would add clarity if we changed: The FENCE is not there to prevent the early setting of an A bit associated with an explicit access that is after the FENCE. It is there to ensure (amongst other things) that all required PTE updates for explicit accesses prior to the FENCE have occurred. However, since the PTE updates are already mandated to be observed before the explicit access, a FENCE that enforces these explicit accesses already effectively ensures that the PTE bits will already be visible. Therefore, it seems like this comment about the FENCE should be a non-normative note as it is not providing any new requirements. |
How about the following that captures the idea needing clarification while staying closer to a trimmed down version of the existing text: Updates to the A bit may be performed speculatively, even if the associated memory accesses ultimately are not performed architecturally. Updates to the D bit must only be done non-speculatively and must observed in program order by the local hart. During a two-stage address translation, updates to the D bit in G-stage PTEs may be performed as a result of speculative updates of the A bit in VS-stage PTEs. |
Regarding Ken's FENCE comment, it does seem like it's a simple matter of the fence ordering the surrounding explicit accesses and the rest follows, i.e. A/D bits updates by predecessor accesses are also ordered before the fence, A bit updates by successor accesses are not ordered by the fence, and D bit updates by successor accesses are also not ordered by the fence but still must maintain program ordering wrt all surrounding D bit updates. (Note that doing non-speculative D-bit updates by successor accesses doesn't mean that these D-bit updates are guaranteed to be ordered after all predecessor accesses. This is for the same reason that (under RVWMO) a series of non-spec explicit memory accesses to different addresses can go out to the ordering point out-of-order and be globally ordered o-o-o wrt each other.) So it seems like it would be just a non-normative comment "reminding" people that A/D bit updates due to memory accesses ordered-after the fence are not themselves ordered-after the fence (and D-bit updates only maintain program order wrt surrounding D-bit updates). |
The proposed update looks good to me. I have created PR #18 with the suggested updates. Please review. |
So are the control bit and two-stage translation changes in the rejected PR #17, part of another PR already going into the spec? |
So just to confirm, the control bit and two-stage translation chnages are already taken care of elsewhere? |
You may be referring to PR #13 that was the last PR applied before freeze. This had the update to refer to the address translation sections by name instead of number and the renaming of the control bits from HADE to ADUE. |
Yes. |
The changes look better. However, the updating of the D bit needs clarification. In one sentence it says
Such wording separates out the D bit setting in a VS-stage PTE --- which must be exact, from the D bit setting in a G-stage PTE --- which does not need to be exact. Also the following still has some ambiguity: |
Thank you, Ken. I understand the source of confusion. When referencing a PTE - whether it's S/VS or G - the D bit must be set precisely when the outcome is from an explicit store. However, for G-stage PTEs, the D bit can be speculatively set due to an implicit store triggered by setting the A bit in the VS-stage PTEs. To clarify this nuance, I suggest the following revision: "Updates to the D bit , resulting from an explicit store, must be exact (i.e., non-speculative)." Kindly review the updated PR for further details. |
@kdockser - hope this update addressed the concern. |
Thanks Ved. The changes add a lot of clarity. |
The svadu spec states:
The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts.
Unfortunately, this removes the ability to speculatively update the A bit. For example, a FENCE may be jumped over by a mispredicted branch and a load could have speculatively occurred. The FENCE will not be seen until the processor redirects, well after the "A" bit had been speculatively set. Either the processor needs a means to roll back the setting of that A bit (this would be nasty), or it cannot speculatively set any A bits.
What is the value of ordering the update of the A and D bits by the FENCE? It is worth the loss of speculatively performing page table walks?
The text was updated successfully, but these errors were encountered: