Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: audiomate: A Python package for working with audio datasets #2135

Closed
38 tasks done
whedon opened this issue Mar 3, 2020 · 68 comments
Closed
38 tasks done

[REVIEW]: audiomate: A Python package for working with audio datasets #2135

whedon opened this issue Mar 3, 2020 · 68 comments
Assignees
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review

Comments

@whedon
Copy link

whedon commented Mar 3, 2020

Submitting author: @ynop (Matthias Büchi)
Repository: https://github.com/ynop/audiomate
Version: v6.0.0
Editor: @terrytangyuan
Reviewer: @mulhod, @faroit
Archive: 10.5281/zenodo.3970567

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b3b6733b4649ded21f99c68ba7e091d5"><img src="https://joss.theoj.org/papers/b3b6733b4649ded21f99c68ba7e091d5/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b3b6733b4649ded21f99c68ba7e091d5/status.svg)](https://joss.theoj.org/papers/b3b6733b4649ded21f99c68ba7e091d5)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mulhod & @faroit , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @terrytangyuan know.

Please try and complete your review in the next two weeks

Review checklist for @mulhod

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@ynop) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @faroit

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@ynop) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@whedon
Copy link
Author

whedon commented Mar 3, 2020

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @mulhod , @faroit it looks like you're currently assigned to review this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Mar 3, 2020

Reference check summary:

OK DOIs

- None

MISSING DOIs

- https://doi.org/10.1109/icassp.2015.7178964 may be missing for title: Librispeech: An ASR corpus based on public domain audio books

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Mar 3, 2020

@mulhod
Copy link

mulhod commented Mar 4, 2020

Created issue about sox dependency: ynop/audiomate#101

@mulhod
Copy link

mulhod commented Mar 4, 2020

Created issue about python 3.8 issues: ynop/audiomate#102

@mulhod
Copy link

mulhod commented Mar 10, 2020

Create issue about benchmark test failure: ynop/audiomate#109

@arfon
Copy link
Member

arfon commented Mar 14, 2020

Dear authors and reviewers

We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.

Thanks in advance for your understanding.

Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team.

@faroit
Copy link

faroit commented Mar 19, 2020

Sorry for the delay. Things are a bit rough over here in 🇫🇷 . I will hopefully able to provide a review by next week.

@faroit
Copy link

faroit commented Apr 16, 2020

I am now back on the review. Thanks for your patience

@faroit
Copy link

faroit commented May 18, 2020

audiomate is a python based package that addresses the extraction, transformation and loading of audio datasets.
It comes with a large list of available datasets and many of them can be automatically downloaded from within python. The package is very well done and is excellently documented. It offers great value for the scientific audio community.

Concerns

I added a few concerns in the issues opened by me. Most of them are minor can quickly be addressed or have already been addressed by the authors.

However, I have two main concerns that I would ask the authors to address before this paper gets accepted:

Related software tools

The paper does not mention related software packages. In such section other tool would be listed and compared to audiomate. Key differences would be highlighted as well as unique features that emphasise the contributions in audiomate. Here are a few packages that are related:

Audio related software

Non-Audio related software

Reproducibility

In issue 126 I brought up reproducibility issues in audiomate. Compared to other software packages such as mirdata or torchvision.datasets, audiomate does not come with mechanisms to allow users to reproduce dataset downloads. Often there are two ways to foster this:

  • verification: making sure that a downloaded dataset matches a reference file. This is usually verified through the use of checksums/hashes in other packages.
  • versioning: making sure that a downloaded dataset is the same when being downloaded at later times. This is usually archived by referring to specific tags (e.g. on github repos) or unique identifiers (e.g. DOIs on zenodo).

Both features are not covered by audiomate and I would highly suggest to incorporate at least versioning into the software. E.g. this could easily be implemented by the authors by changing links from master branches of repos to specific commits/tags.

Recommendation

For this paper being accepted in JOSS, I would suggest make these changes regarding a related work section in the paper and versioning the dataset downloads. However, I am happy to discuss this issues here with authors and other reviewers and I leave the final decision for @terrytangyuan

@arfon
Copy link
Member

arfon commented May 20, 2020

Thanks @faroit! @ynop - please let us know when you've had a chance to make changes in response to @faroit's review.

@ynop
Copy link

ynop commented May 25, 2020

Hi @faroit, thanks for your review.

Regarding reproducibility I completely agree with you, that versioning and verification would be cool. But this is a lot of work and I am not sure if we can do that in the near future. The only change that is possible with reasonable effort is using fixed commits for git-based datasets. The problem with that is that you have to update audiomate, every time a new version is released (and know that there is a new version). Furthermore, a user can fix the version of such datasets on her/his own by passing the specific url to the downloader.

Regarding related software, I will add a section in the paper.

@faroit
Copy link

faroit commented May 25, 2020

The only change that is possible with reasonable effort is using fixed commits for git-based datasets.
The problem with that is that you have to update audiomate, every time a new version is released (and know that there is a new version).

I think there is no way around this. Given how less frequent datasets are actually updated the workload is quite small, also I would say that reproducibility is more important for researchers than having the most recent version of the a dataset these . Then, this would currently only affect 3 datasets in audiomate, so this change would be rather quick, I guess.

Furthermore, a user can fix the version of such datasets on her/his own by passing the specific url to the downloader.

Maybe have a look at torch.hub to implement a similar repo/tag scheme?

@ynop
Copy link

ynop commented May 25, 2020

Another thought: What happens If we update the version in audiomate. The user either has to keep working with the old audiomate version (and fix it) or use a custom download URL afterwards. I am not sure if it is a good approach to have the dataset versions depend on the software version, since this is not really transparent to the user.

@faroit
Copy link

faroit commented May 26, 2020

@ynop

what happens If we update the version in audiomate. The user either has to keep working with the old audiomate version (and fix it) or use a custom download URL afterwards. I am not sure if it is a good approach to have the dataset versions depend on the software version, since this is not really transparent to the user.

I did a bit of research and I found that most libraries actually do not support a good way to version the dataset. In many cases, the dataset is just verified with a checksum, so when a dataset is changed, the checksum fails.

A detailed solution is proposed by the tensorflow tfds team. Basically they followed your idea to implement versions for each dataset. Now I agree that this is significant amount of work, so I agree to take this out of the JOSS review.

@arfon
Copy link
Member

arfon commented Jun 24, 2020

Now I agree that this is significant amount of work, so I agree to take this out of the JOSS review.

Thanks @faroit. Could you check on @ynop's most recent updates to see if you're able to check off the last few checkboxes for your review?

@faroit
Copy link

faroit commented Jun 24, 2020

@arfon as far as I can see, there has not been any changes made to the paper.

@aahlenst
Copy link

Sorry, it's stuck on my desk. Swamped with work. Should get better next week.

@ynop
Copy link

ynop commented Jul 1, 2020

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jul 1, 2020

@ynop
Copy link

ynop commented Aug 3, 2020

@terrytangyuan New version is v6.0.0 and DOI is 10.5281/zenodo.3970567

@terrytangyuan
Copy link
Member

@whedon set v6.0.0 as version

@whedon
Copy link
Author

whedon commented Aug 3, 2020

OK. v6.0.0 is the version.

@terrytangyuan
Copy link
Member

@whedon set 10.5281/zenodo.3970567 as archive

@whedon
Copy link
Author

whedon commented Aug 3, 2020

OK. 10.5281/zenodo.3970567 is the archive.

@terrytangyuan
Copy link
Member

@whedon accept

@whedon
Copy link
Author

whedon commented Aug 3, 2020

Attempting dry run of processing paper acceptance...

@whedon whedon added the recommend-accept Papers recommended for acceptance in JOSS. label Aug 3, 2020
@whedon
Copy link
Author

whedon commented Aug 3, 2020

Reference check summary:

OK DOIs

- 10.1109/icassp.2015.7178964 is OK

MISSING DOIs

- https://doi.org/10.1016/b978-0-08-099388-1.00019-4 may be missing for title: Audio Datasets

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Aug 3, 2020

👋 @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof 👉 openjournals/joss-papers#1609

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#1609, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@terrytangyuan
Copy link
Member

@openjournals/joss-eics This paper looks good to me now. Handing over to you now!

@danielskatz
Copy link

@openjournals/dev - note that in the XML there are a bunch of badly formed URLs in unstructured citations. This is a bug we've had for a while.

  1. What should we do in this case?
  2. Can we fix the bug?

@danielskatz
Copy link

@ynop - please merge ynop/audiomate#133

@ynop
Copy link

ynop commented Aug 3, 2020

@danielskatz Done.

@arfon
Copy link
Member

arfon commented Aug 5, 2020

@whedon accept

@whedon
Copy link
Author

whedon commented Aug 5, 2020

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented Aug 5, 2020

Reference check summary:

OK DOIs

- 10.1109/icassp.2015.7178964 is OK

MISSING DOIs

- https://doi.org/10.1016/b978-0-08-099388-1.00019-4 may be missing for title: Audio Datasets

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Aug 5, 2020

👋 @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof 👉 openjournals/joss-papers#1626

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#1626, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@danielskatz
Copy link

@whedon accept deposit=true

@whedon whedon added the accepted label Aug 5, 2020
@whedon
Copy link
Author

whedon commented Aug 5, 2020

Doing it live! Attempting automated processing of paper acceptance...

@whedon whedon added the published Papers published in JOSS label Aug 5, 2020
@whedon
Copy link
Author

whedon commented Aug 5, 2020

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

@whedon
Copy link
Author

whedon commented Aug 5, 2020

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.02135 joss-papers#1627
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.02135
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

@danielskatz
Copy link

Thanks to @mulhod and @faroit reviewing, and @terrytangyuan for editing!

Congratulations to @ynop (Matthias Büchi) and co-author!!

@whedon
Copy link
Author

whedon commented Aug 5, 2020

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.02135/status.svg)](https://doi.org/10.21105/joss.02135)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.02135">
  <img src="https://joss.theoj.org/papers/10.21105/joss.02135/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.02135/status.svg
   :target: https://doi.org/10.21105/joss.02135

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review
Projects
None yet
Development

No branches or pull requests

8 participants