Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow and check for "annotated file = annotation file - annot ext + annotated format ext" #563

Closed
4 tasks
NickleDave opened this issue Aug 17, 2022 · 3 comments · Fixed by #572
Closed
4 tasks
Labels
ENH: enhancement enhancement; new feature or request

Comments

@NickleDave
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Currently when preparing a vak dataset from a behavioral dataset of 1:1 audio + annotation files, we assume in all cases that the annotation file contains the name of the audio file that it annotates, e.g. "mouse1-day1.wav.csv" is the annotation for "mouse1-day1.wav". This convention is not clearly documented (yet, see #524), can give rise to subtle errors (as described in #525), and will actually cause failures with the new version of crowsetta because it is not true for all formats (e.g. Audacity TextLabels) that the audio filename will be contained in the annotation file name. So this also blocks #526 (but I guess is technically an enhancement given the way things work currently).

Many (most?) users will find it intuitive if the audio filename is simply the annotation filename with the annotation format extension removed and the audio format extension added in its place, e.g. "mouse1-day1.csv" is the annotation for "mouse1-day1.wav". Basically what @sthaar describes in #523 (comment).

Describe the solution you'd like
vak.io.audio.to_spect should check for both possibilities:
that "audio_path = annot_path - annot_ext + audio_ext" or "audio_path = annot_path - annot_ext".
Not sure how to implement yet -- this part of the code base is spaghetti code-ish.
Simplest is "do the old way first and then try the new way if files don't exist when doing it the old way".
Would be nice if crowsetta annotation formats could "declare" the expected mapping from annotation file -> annotated file and then we could use that in vak. This could be a default with ability to override.

Describe alternatives you've considered
Leaving things as is. The idea here was to allow users to have other .csv files in the same directory. Making it either/or still allows that while also providing what many users expect, so I favor adding this.

Additional context
Order of operations should be:

@NickleDave NickleDave added the ENH: enhancement enhancement; new feature or request label Aug 17, 2022
@NickleDave
Copy link
Collaborator Author

Not sure how to implement yet -- this part of the code base is spaghetti code-ish.
Simplest is "do the old way first and then try the new way if files don't exist when doing it the old way"

After renaming / refactoring in #566 I think it makes sense to do this all inside the map_annotated_to_annot function

@NickleDave
Copy link
Collaborator Author

I think I have something that will work for this.

The basic idea is to try the old way first (assume audio file is in any "annotated" name, and can be mapped to annotation file stem), then try the new way (replace annotation extension with "annotated" extension)

The good news is, that brings us a step closer to working with e.g. Audacity files where the mapping is (LabelTrack) "txt" -> "wav"

but I also can't help but feel that these functions in vak.annotation + vak.io.audio, vak.io.spect and vak.io.dataframe are spaghetti code that's trying to do too many things at once, and all of this would be much more readable if:

  • we were using a dataset class to build the dataset, that we just add audio + annotation files too and then spectrogram, as in ENH: Rewrite core.prep to use vocles + vocalpy, remove io module #558 basically
  • we limit the functionality of vak.io.dataframe.from_files so it basically does one of two things: make a dataset from an audio_dir or from a spect_dir. Don't offer the ability to pass in audio_files or spect_files from API, since this forces us to cover a lot of bases we never really use. If someone needs to do something more complicated, they can instead do it by hand with vocles then use the resulting dataset just like one built by vak

@NickleDave NickleDave changed the title ENH: allow and check for "audio file = annotation file - annot ext + audio format ext" ENH: allow and check for "annotated file = annotation file - annot ext + annotated format ext" Aug 28, 2022
@NickleDave
Copy link
Collaborator Author

Change issue name to reflect that this should work for any annotated file, including spectrogram files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENH: enhancement enhancement; new feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant