Replies: 1 comment 1 reply
-
Really, no one has encountered such problems? :( |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I am using online recognition as in the example:
https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo.ipynb
But when a large buffer size is set hard (example, 40 sec), the quality of recognition of small files suffers. This is due to the fact that the missing buffer is filled with zeros in the transcribe() function.
frame_len = 40
self.n_frame_len = int(frame_len * self.sr)
self.buffer = np.zeros(shape=self.n_frame_len, dtype=np.float32)
def _decode(self, frame):
assert len(frame)==self.n_frame_len
self.buffer[:len(frame)] = frame
logits = infer_signal(asr_model, self.buffer).cpu().numpy()[0]
decoded = self._greedy_decoder(logits, self.vocab)
return decoded
@torch.no_grad()
def transcribe(self, frame=None, merge=True):
if frame is None:
frame = np.zeros(shape=self.n_frame_len, dtype=np.float32)
if len(frame) < self.n_frame_len:
frame = np.pad(frame, [0, self.n_frame_len - len(frame)], 'constant')
unmerged = self._decode(frame)
if not merge:
return unmerged
return self.greedy_merge(unmerged)
This issue is resolved by switching to a resizable buffer, but results in GPU memory leaks.
def _decode(self, frame):
logits = infer_signal(asr_model, frame).cpu().numpy()[0]
decoded = self._greedy_decoder(logits, self.vocab)
return decoded
@torch.no_grad()
def transcribe(self, frame=None, merge=True):
if frame is None:
frame = np.zeros(shape=self.n_frame_len, dtype=np.float32)
unmerged = self._decode(frame)
if not merge:
return unmerged
return self.greedy_merge(unmerged)
What is the cause of this memory leak and what are the solutions other than a fixed buffer size?
Beta Was this translation helpful? Give feedback.
All reactions