-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamps are None? Why? How to handle them? #6
Comments
Hi :) Thank you! Very happy to hear that you are using it and find it adds value. I have sometimes seen this when using a different language tag then the one of the speaker in the audio sample you are trying to transcribe. Please also try it when installing our custom transformers fork that improves some aspects of the DTW alignment running in the background. You can install it with: pip install git+https://github.com/nyrahealth/transformers.git@crisper_whisper If this does not resolve your issue its unfortunately always hard to debug without having access to the audio where it is occuring..... But if this does not resolve it let me know and we can look into it further :) |
Thanks for your quick response! However, I already installed that custom fork, cause I found it in a closed issue. My data is in English,so it shouldn't be a problem with a language mismatch. Unfortunately, I cannot share the data because of ethical considerations. Let me go more deeper into the problem to see what can happen :S In the worst case i would handle the exception :( |
Uff i see. Well sorry i cant help you, tough to say whats going wrong here without being able to debug into it. I would however assume its always the very last timestamp thats None? If that would be the case then you could maybe adjust this function here slightly
So you would only adjust this last timestamp with something manaul that makes sense.... for example the last timestamp + average word length or something like this depending on the application.... If you encounter this with a audio that you can share i would be glad to help you :) |
Make sense your adjustment based on the average duration. Here you can see what I found when printing the timestamps:
The problem is the very last timestamp. What is weird is that even the start_timestamp is lower than the previous one :S Additionally, I can tell you that the model hallucinated a bit with the transcription. |
Would love to look into that with the audiofile. Hard to tell otherwise. Playing around with the beam_size often helps quite a bit with hallucinations..... Generally a heuristic for detecting hallucinations should be that timestamps on hallucinated content become very short so could be filtered on that (atleast partly). I am soon going to look into the actual decoder cross attention heads and see if one can clearly detect hallucinations from unusual cross attention behaviour of those dedicated heads and improve on the current version. |
Regarding the beam size, can i modify this hyper-parameter using the model through Hugging Face? This is my code after following your tutorial:
Does the By the way, I am experimenting other kind of issues. Look at this trace:
Fortunately, it was a problem with the original whisper model and it has been solved by increasing the |
Hi :)
First of all, of course, congrats for your work. I think CrisperWhisper is going to be so useful for the research community!
However, I am creating this issue because I am noticing that, when processing my data, sometime the timestamps are
None
. I found this error, whose traceback is here:Why is this happening? Is there a way to handle this situation, e.g., a try-catch to establish pause_duration=0 in case this happens. I have to process quite a lot of data and, although i would prefer other solution, i can assume a certain amount of mistakes.
Thanks in advance. Best regards,
David.
The text was updated successfully, but these errors were encountered: