-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding support for AudioStorageEngine #623
Conversation
jarulraj
commented
Mar 28, 2023
•
edited
Loading
edited
- Branch based on feat: Add support for AudioStorageEngine #620
eva/catalog/catalog_type.py
Outdated
@@ -98,3 +98,20 @@ class IndexType(EVAEnum): | |||
@classmethod | |||
def is_faiss_index_type(cls, t): | |||
return t in [cls.HNSW] | |||
|
|||
|
|||
class ColumnName(EVAEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this! Maybe make the name more descriptive and add similar logic for images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call them VideoColumnNames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need mp4
file?
Review comments fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good apart from minor comments.
eva/readers/decord_reader.py
Outdated
av_reader = decord.AVReader(self.file_url, mono=True, sample_rate=16000) | ||
except decord._ffi.base.DECORDError as error_msg: | ||
if "Can't find audio stream" in str(error_msg): | ||
print("No audio stream in video") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should assert
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should rather return the error message from decord.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we should check for AudioReader
or VideoReader
or AVReader
?
eva/readers/decord_reader.py
Outdated
@@ -108,12 +121,34 @@ def _read(self) -> Iterator[Dict]: | |||
if begin % self._sampling_rate: | |||
begin += self._sampling_rate - (begin % self._sampling_rate) | |||
for frame_id in range(begin, end + 1, self._sampling_rate): | |||
frame = video[frame_id].asnumpy() | |||
frame = self.__get_frame(frame_id, av_reader, v_reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can yield directly from the function rather than checking for None
again.
test/readers/test_decord_reader.py
Outdated
# gave a big enough batch_mem_size so that all frames fit in one batch | ||
self.assertEqual(len(batches[0]), 996) | ||
# verify that the 100th frame is read correctly and hope that all the other frames were too! | ||
assert np.sum(batches[0].frames.loc[100]["audio"]) == pytest.approx(2.7592432) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about also pushing a small numpy of decoded data and comparing against it?
test/readers/test_decord_reader.py
Outdated
# gave a big enough batch_mem_size so that all frames fit in one batch | ||
self.assertEqual(len(batches[0]), 996) | ||
# verify that the 100th frame is read correctly and hope that all the other frames were too! | ||
assert np.sum(batches[0].frames.loc[100]["audio"]) == pytest.approx(2.7592432) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
eva/catalog/catalog_type.py
Outdated
@@ -98,3 +98,20 @@ class IndexType(EVAEnum): | |||
@classmethod | |||
def is_faiss_index_type(cls, t): | |||
return t in [cls.HNSW] | |||
|
|||
|
|||
class ColumnName(EVAEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call them VideoColumnNames?
eva/readers/decord_reader.py
Outdated
av_reader = decord.AVReader(self.file_url, mono=True, sample_rate=16000) | ||
except decord._ffi.base.DECORDError as error_msg: | ||
if "Can't find audio stream" in str(error_msg): | ||
print("No audio stream in video") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we should check for AudioReader
or VideoReader
or AVReader
?
eva/readers/decord_reader.py
Outdated
else: | ||
break | ||
|
||
def __get_frame(self, frame_id, av_reader, v_reader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't hit these decisions for every frame. How about we have 3 functions and use logic upfront to decide the correct callable and then just invoke it?
Just FYI, decord can't be installed via pypi on recent macos with pyhton3.9. Reference |
eva/readers/decord_reader.py
Outdated
super().__init__(*args, **kwargs) | ||
|
||
def _read(self) -> Iterator[Dict]: | ||
decord = _lazy_import_decord() | ||
video = decord.VideoReader(self.file_url) | ||
num_frames = int(len(video)) | ||
if self._read_audio: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can move this logic (reader selection) to a separate function. Just want to clean the code as much as possible.
eva/readers/decord_reader.py
Outdated
} | ||
|
||
def __get_audio_frame(self, frame_id, reader): | ||
frame_audio, _ = reader[frame_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are going with audio per frame design. Will this work for whisper? If yes, we are good. We can worry about the sampling rate in the next PR when we add whisper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I verified that it works with whisper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Fix and merge. Thanks for the effort!