Whisper Crash Fix #19345

petermcaughan · 2024-01-31T03:25:27Z

Description

There is a current bug in the BeamSearch implementation of T5, GPT, and Whisper due to an interaction between two PRs merged in the past 7 months.

First PR/code change is the addition of BeamSearchScorer GPU implementation. This PR accelerates some operations by executing them in the GPU and not the CPU. The approach for this code change didn't utilize a cudaStream when copying one particular variable from GPU to CPU (see nullptr value here: [link]).

The second PR/code change was the alteration to utilize a cudaStream to initialize various memory buffers in BeamSearch (see stream included as the last argument in these allocations [link]).

During the in-between period of these two PRs, I believe neither allocation utilized a stream and were thus synchronized. Once the latter PR was merged, the copy became desynchronized with the initialization due to different streams.

The fix for this is to reintroduce the same stream into the copy operation added in the first PR.

Motivation and Context

This does not happen reliably on every hardware with every script due to the race condition nature, but the bug completely breaks ORT execution with a BeamSearch model when encountered.

### Description There is a current bug in the BeamSearch implementation of T5, GPT, and Whisper due to an interaction between two PRs merged in the past 7 months. First PR/code change is the addition of BeamSearchScorer GPU implementation. This PR accelerates some operations by executing them in the GPU and not the CPU. The approach for this code change didn't utilize a cudaStream when copying one particular variable from GPU to CPU (see nullptr value here: [[link](https://github.com/microsoft/onnxruntime/blob/b65d3d0a5374daa3bc9272c2c02763a8428660db/onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_t5.h#L213)]). The second PR/code change was the alteration to utilize a cudaStream to initialize various memory buffers in BeamSearch (see `stream` included as the last argument in these allocations [[link](https://github.com/microsoft/onnxruntime/blob/d1431e1b78fb81bf90fdc58c9118cb011171f387/onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_base.h#L25)]). During the in-between period of these two PRs, I believe neither allocation utilized a stream and were thus synchronized. Once the latter PR was merged, the copy became desynchronized with the initialization due to different streams. The fix for this is to reintroduce the same stream into the copy operation added in the first PR. ### Motivation and Context This does not happen reliably on every hardware with every script due to the race condition nature, but the bug completely breaks ORT execution with a BeamSearch model. --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>

Peter McAughan added 4 commits January 30, 2024 22:10

Revert AllocateBuffer change

986c42e

More precise fix

167d555

Final fix

dd6be6c

Add fixes for T5 and GPT

505ce98

petermcaughan requested review from tianleiwu, yufenglee, RyanUnderhill and kunal-vaishnavi January 31, 2024 03:58

kunal-vaishnavi approved these changes Jan 31, 2024

View reviewed changes

tianleiwu added the release:1.17.0 label Jan 31, 2024

yufenglee approved these changes Jan 31, 2024

View reviewed changes

souptc approved these changes Jan 31, 2024

View reviewed changes

tianleiwu approved these changes Jan 31, 2024

View reviewed changes

petermcaughan merged commit 4562c91 into main Jan 31, 2024
98 checks passed

petermcaughan deleted the petermca/whisper_crash_fix branch January 31, 2024 05:53

sophies927 added release:1.17.1 triage:approved Approved for cherrypicks for release and removed release:1.17.0 labels Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper Crash Fix #19345

Whisper Crash Fix #19345

petermcaughan commented Jan 31, 2024 •

edited

Loading

Whisper Crash Fix #19345

Whisper Crash Fix #19345

Conversation

petermcaughan commented Jan 31, 2024 • edited Loading

Description

Motivation and Context

petermcaughan commented Jan 31, 2024 •

edited

Loading