-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary of debugging WarpCTCLayer #3076
Comments
hi team, it seems like you guys have a pretty good idea on what could be happening. I will just add my 2c on invitation from @wangkuiyi i) if the cpu version doesn't throw up similar errors, then its almost definitely a bug that needs to be fixed. |
@sancha Very thanks for your helpful analysis and suggestion. It's interesting that i) In fact, illegal |
Awesome, you have all your bases covered 👍 Happy to help if you need anything from me in the future. |
Closing this issue due to inactivity, feel free to reopen it. |
I often encounter
inf
cost when training mandanrin data using deep speech2 (GPU version). It seems thatWarpCTCLayer
may have potential numerical problems. So, @qingqing01 and I have been tried to figure out what leads toinf
. Considering thatinf
doesn't appear regularly, we save the two inputs ofWarpCTCLayer
usingprintValueString
firstly, then parse and load the saved context in debugging phase. However, loading the exception context only increases probability ofinf
which means that regular reproduction is not assured.For
inf
, we find two suspicious snippetsseq2batchPadding
Please go to seq2batchPadding to see details. We detect
-inf
inbatchValue_
just before callinghl_warpctc_compute_loss
, since thatseq2batchPadding
is the only function in whichbatchValue
is modified exceptresizeOrCreate
. So we considerseq2batchPadding
as a suspicious reason.status: Fixed by #3105
compute_probs_kernel
We also dig into wrap ctc kernal and find that
compute_probs_kernel
will appear0
after exponent operation. Location of exponent operation snnipet is at ctc_helper::exponential, this leads to0
contained inprobs_
. Unfortunately,probs_
will be passed into compute_alpha_kernel and illegal operationlog(0)
is detected at line167 and line191 incompute_alpha_kernel
. We also consider this as a suspicious reason.status: Fixed by #5
Besides, we also encounter a validataion error, details are listed below:
This fatal exception is throwed by here. The reason hasn't been figured out.
The text was updated successfully, but these errors were encountered: