Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCache does not block Acquire/ProbeAck by outstanding Release #2832

Merged
merged 4 commits into from
May 4, 2021

Conversation

ingallsj
Copy link
Contributor

@ingallsj ingallsj commented May 3, 2021

TileLink includes these message ordering rules:

Release
Once the Release is issued, the master should not issue a ProbeAck, ProbeAckData, Acquire, Get, Put, ArithmeticData, LogicalData, or further Release on the block until it receives a ReleaseAck from the slave acknowledging completion of the writeback.

Related issue: The Rocket/Bullet DCache relaxed this in #2696 to allow a ProbeAck or Acquire to overtake a Release, but not overtake a ReleaseData.

Type of change: bug report

Impact: functional change

Development Phase: implementation

@ingallsj ingallsj requested a review from aswaterman May 3, 2021 03:47
@aswaterman
Copy link
Member

Paging all this back in, I think this PR will have a significant impact on our memory-latency benchmarks.

  • Removing release_ack_dirty from the tl.a.valid will prevent issuing a cache miss while the drop of an unrelated clean line is pending, which is unnecessary serialization.
  • The reason I added release_ack_dirty to the block_probe logic is that it fixed a perf problem with back-probes. The issue is that we don't always perform noisy drops: if there's already a pending release that hasn't been acked, we instead do a silent drop, because otherwise we'd have to stall. So back-probes do happen. And when they do, the address comparison always matches, because of course the address being probed has the same index as the most recently dropped line.

Unless I'm missing something, we should just undo the change to tl.a.valid.

As for the second change, I see a few options:

  • Include more bits in the address comparison to eliminate most false positives (say, an additional pgLevelBits).
  • Change the TL spec as discussed on the ticket.

In the interest of forward progress, I'll suggest backing out the tl.a.valid change, and changing the comparison logic to use bits ((pgIdxBits + pgLevelBits) min paddrBits) - 1 .. idxLSB. That should work OK in practice without coupling this PR to the TL discussion.

Of course, I'm open to alternate suggestions, and if it appears the TL spec will actually change, it certainly would be nice to make this whole point moot.

@ingallsj
Copy link
Contributor Author

ingallsj commented May 3, 2021

  • Removing release_ack_dirty from the tl.a.valid will prevent issuing a cache miss while the drop of an unrelated clean line is pending, which is unnecessary serialization.
    Unless I'm missing something, we should just undo the change to tl.a.valid.
    In the interest of forward progress, I'll suggest backing out the tl.a.valid change,

Is this scenario handled elsewhere? Issuing a cache miss (AcquireBlock) while the drop (Release) of the same clean line is pending. If yes, then I agree, I should not have put in the tl.a.valid change. If not, then the race starts looking similar for Acquire-Release as ProbeAck-Release.

The reason I added release_ack_dirty to the block_probe logic is that it fixed a perf problem with back-probes. The issue is that we don't always perform noisy drops: if there's already a pending release that hasn't been acked, we instead do a silent drop, because otherwise we'd have to stall. So back-probes do happen. And when they do, the address [index] comparison always matches, because of course the address being probed has the same index as the most recently dropped line.
As for the second change, I see a few options:
Include more bits in the address comparison to eliminate most false positives (say, an additional pgLevelBits).
In the interest of forward progress, I'll suggest backing out the tl.a.valid change, and changing the comparison logic to use bits ((pgIdxBits + pgLevelBits) min paddrBits) - 1 .. idxLSB. That should work OK in practice without coupling this PR to the TL discussion.

Will do now.

…comparison to eliminate most false positives
@aswaterman
Copy link
Member

@ingallsj Hmm, I think you might be right about the tl.a.valid case. Ugh.

Perhaps we stick with your change but add an address check--I think comparing the same set of bits between release_ack_addr and s2_req.addr would suffice. Agreed?

@ingallsj ingallsj merged commit f2f3a1b into master May 4, 2021
@ingallsj ingallsj deleted the releaseProbe branch May 4, 2021 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants