Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding support for AudioStorageEngine #623

Merged
merged 187 commits into from
Apr 2, 2023
Merged

Conversation

jarulraj
Copy link
Member

@jarulraj jarulraj commented Mar 28, 2023

@@ -98,3 +98,20 @@ class IndexType(EVAEnum):
@classmethod
def is_faiss_index_type(cls, t):
return t in [cls.HNSW]


class ColumnName(EVAEnum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this! Maybe make the name more descriptive and add similar logic for images

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call them VideoColumnNames?

Copy link
Member

@gaurav274 gaurav274 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need mp4 file?

Copy link
Member

@gaurav274 gaurav274 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good apart from minor comments.

av_reader = decord.AVReader(self.file_url, mono=True, sample_rate=16000)
except decord._ffi.base.DECORDError as error_msg:
if "Can't find audio stream" in str(error_msg):
print("No audio stream in video")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should rather return the error message from decord.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we should check for AudioReader or VideoReader or AVReader?

@@ -108,12 +121,34 @@ def _read(self) -> Iterator[Dict]:
if begin % self._sampling_rate:
begin += self._sampling_rate - (begin % self._sampling_rate)
for frame_id in range(begin, end + 1, self._sampling_rate):
frame = video[frame_id].asnumpy()
frame = self.__get_frame(frame_id, av_reader, v_reader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can yield directly from the function rather than checking for None again.

# gave a big enough batch_mem_size so that all frames fit in one batch
self.assertEqual(len(batches[0]), 996)
# verify that the 100th frame is read correctly and hope that all the other frames were too!
assert np.sum(batches[0].frames.loc[100]["audio"]) == pytest.approx(2.7592432)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also pushing a small numpy of decoded data and comparing against it?

# gave a big enough batch_mem_size so that all frames fit in one batch
self.assertEqual(len(batches[0]), 996)
# verify that the 100th frame is read correctly and hope that all the other frames were too!
assert np.sum(batches[0].frames.loc[100]["audio"]) == pytest.approx(2.7592432)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -98,3 +98,20 @@ class IndexType(EVAEnum):
@classmethod
def is_faiss_index_type(cls, t):
return t in [cls.HNSW]


class ColumnName(EVAEnum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call them VideoColumnNames?

av_reader = decord.AVReader(self.file_url, mono=True, sample_rate=16000)
except decord._ffi.base.DECORDError as error_msg:
if "Can't find audio stream" in str(error_msg):
print("No audio stream in video")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we should check for AudioReader or VideoReader or AVReader?

else:
break

def __get_frame(self, frame_id, av_reader, v_reader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hit these decisions for every frame. How about we have 3 functions and use logic upfront to decide the correct callable and then just invoke it?

@suryatejreddy
Copy link
Collaborator

Just FYI, decord can't be installed via pypi on recent macos with pyhton3.9. Reference

super().__init__(*args, **kwargs)

def _read(self) -> Iterator[Dict]:
decord = _lazy_import_decord()
video = decord.VideoReader(self.file_url)
num_frames = int(len(video))
if self._read_audio:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can move this logic (reader selection) to a separate function. Just want to clean the code as much as possible.

}

def __get_audio_frame(self, frame_id, reader):
frame_audio, _ = reader[frame_id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are going with audio per frame design. Will this work for whisper? If yes, we are good. We can worry about the sampling rate in the next PR when we add whisper.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I verified that it works with whisper

Copy link
Member

@gaurav274 gaurav274 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Fix and merge. Thanks for the effort!

@tushar-97 tushar-97 merged commit 8345190 into master Apr 2, 2023
@tushar-97 tushar-97 deleted the tushar-97-audio-engine branch April 2, 2023 23:47
@jarulraj jarulraj mentioned this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants