-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: allow and check for "annotated file = annotation file - annot ext + annotated format ext" #563
Comments
After renaming / refactoring in #566 I think it makes sense to do this all inside the |
I think I have something that will work for this. The basic idea is to try the old way first (assume audio file is in any "annotated" name, and can be mapped to annotation file stem), then try the new way (replace annotation extension with "annotated" extension) The good news is, that brings us a step closer to working with e.g. Audacity files where the mapping is (LabelTrack) "txt" -> "wav" but I also can't help but feel that these functions in
|
Change issue name to reflect that this should work for any annotated file, including spectrogram files |
Is your feature request related to a problem? Please describe.
Currently when preparing a vak dataset from a behavioral dataset of 1:1 audio + annotation files, we assume in all cases that the annotation file contains the name of the audio file that it annotates, e.g. "mouse1-day1.wav.csv" is the annotation for "mouse1-day1.wav". This convention is not clearly documented (yet, see #524), can give rise to subtle errors (as described in #525), and will actually cause failures with the new version of
crowsetta
because it is not true for all formats (e.g. Audacity TextLabels) that the audio filename will be contained in the annotation file name. So this also blocks #526 (but I guess is technically an enhancement given the way things work currently).Many (most?) users will find it intuitive if the audio filename is simply the annotation filename with the annotation format extension removed and the audio format extension added in its place, e.g. "mouse1-day1.csv" is the annotation for "mouse1-day1.wav". Basically what @sthaar describes in #523 (comment).
Describe the solution you'd like
vak.io.audio.to_spect
should check for both possibilities:that "audio_path = annot_path - annot_ext + audio_ext" or "audio_path = annot_path - annot_ext".
Not sure how to implement yet -- this part of the code base is spaghetti code-ish.
Simplest is "do the old way first and then try the new way if files don't exist when doing it the old way".
Would be nice if
crowsetta
annotation formats could "declare" the expected mapping from annotation file -> annotated file and then we could use that invak
. This could be a default with ability to override.Describe alternatives you've considered
Leaving things as is. The idea here was to allow users to have other .csv files in the same directory. Making it either/or still allows that while also providing what many users expect, so I favor adding this.
Additional context
Order of operations should be:
source_annot_map
, raise error for current way that points that functionality added above (as in CLN: rename/refactorrecursive_stem
and add clearer error message #525)The text was updated successfully, but these errors were encountered: