Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level Zero - Fix OOM & Improve Thread Safety #845

Merged
merged 15 commits into from
May 8, 2024
Merged

Level Zero - Fix OOM & Improve Thread Safety #845

merged 15 commits into from
May 8, 2024

Conversation

pvelesko
Copy link
Collaborator

@pvelesko pvelesko commented May 5, 2024

  • Fix the issue where not all events were being recycled by Level Zero event collector
  • Refactor CHIPEventLevel0::wait() to get rid of race conditions reported by valgrind. The API states that calls to zeEventHostSynchronize and zeEventQuery are thread safe but valgrind reports race conditions.
  • Implement a global shared mutex - ApiMtx which is to be locked by every HIP API call. This prevents multiple HIP commands from executing at the same time. Pretty coarse lock which we can relax over time since performance is affected only for multithreaded HIP applications of which I haven't seen any yet.

@pvelesko pvelesko force-pushed the thread-safety branch 2 times, most recently from 67ec73e to 54e85b3 Compare May 5, 2024 22:36
@pvelesko pvelesko changed the title EventPool - cleanup Level Zero - Fix OOM & Improve Thread Safety May 5, 2024
@pvelesko pvelesko marked this pull request as ready for review May 5, 2024 22:38
@pvelesko pvelesko requested a review from linehill May 5, 2024 22:46
@pvelesko pvelesko merged commit 994afd4 into main May 8, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants