Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Soundata: Reproducible use of audio datasets #6634

Closed
editorialbot opened this issue Apr 16, 2024 · 86 comments
Closed

[REVIEW]: Soundata: Reproducible use of audio datasets #6634

editorialbot opened this issue Apr 16, 2024 · 86 comments
Assignees
Labels
accepted published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review TeX Track: 7 (CSISM) Computer science, Information Science, and Mathematics

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Apr 16, 2024

Submitting author: @magdalenafuentes (Magdalena Fuentes)
Repository: https://github.com/soundata/soundata
Branch with paper.md (empty if default branch): paper
Version: v1.0.1
Editor: @faroit
Reviewers: @hagenw, @hadware
Archive: 10.5281/zenodo.11580085

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/1b5f2c6b6aa01aa6d6410b71619239b8"><img src="https://joss.theoj.org/papers/1b5f2c6b6aa01aa6d6410b71619239b8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/1b5f2c6b6aa01aa6d6410b71619239b8/status.svg)](https://joss.theoj.org/papers/1b5f2c6b6aa01aa6d6410b71619239b8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@hagenw & @hadware, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @faroit know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @hadware

📝 Checklist for @hagenw

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=9.83 s (22.3 files/s, 368502.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JSON                            32              0              0        3587269
Python                         100           4179           5751          12252
CSV                             60              0              0          11127
reStructuredText                 9            628            975            629
YAML                             9             42             22            304
XML                              1              0              0            250
Markdown                         4             77              0            157
CSS                              1             34              0            124
TeX                              1              8              0             86
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                           219           4980           6756        3612233
-------------------------------------------------------------------------------

Commit count by author:

   192	Tanmay Khandelwal
   147	Magdalena Fuentes
   101	Rachel Bittner
    90	Justin Salamon
    20	Vincent Lostanlen
    18	Genís Plaja
    18	Pablo Zinemanas
    17	Iran R. Roman
    15	David Rubinstein
    15	Pedro
    11	Name
     9	Marius Miron
     8	genisplaja
     7	Andreas Jansson
     7	Guillem Cortès
     7	Harsh Palan
     7	drubinstein
     7	iranroman
     4	Karn Watcharasupat
     4	Keunwoo Choi
     4	Thor
     3	Michael Scibor
     3	Qingyang (Tom) Xi
     2	Thor Kell
     1	Janne
     1	Kyungyun Lee
     1	ooyamatakehisa

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 1130

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: BSD 3-Clause "New" or "Revised" License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.48550/arXiv.2109.02846 is OK
- 10.5281/zenodo.3527750 is OK
- 10.48550/arXiv.1201.0490 is OK
- 10.48550/arXiv.1605.08695 is OK
- 10.5281/zenodo.4061782 is OK
- 10.3390/app6060162 is OK
- 10.48550/arXiv.2106.04624 is OK
- 10.48550/arXiv.1912.01703 is OK

MISSING DOIs

- Errored finding suggestions for "TensorFlow Datasets, A collection of ready-to-use ...", please try later

INVALID DOIs

- None

@hadware
Copy link

hadware commented Apr 17, 2024

Review checklist for @hadware

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/soundata/soundata?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@magdalenafuentes) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@hadware
Copy link

hadware commented Apr 17, 2024

Hey everyone. Amazing work you did on that library, a real pleasure to be able to review it. I might actually also be using it in the near future :)

@faroit I'm not sure about the authoring: while most authors have strong contributions to the codebase, some (like Guillem Cortès or Xavier Serra) don't have that many in comparison to others. Is that an issue for JOSS? Furthermore, Rachel Bittner from Pysox has made a lot of work on that library, should she be cited as well?

@magdalenafuentes
Copy link

Hi @hadware!

Thanks for reviewing this 🙏

As @faroit said I'm currently on leave so I won't be able to follow the reviewing process as much, but maybe I can quickly clarify this: senior authors such as Xavier Serra and Juan Bello helped in the conceptualization of the libraries (e.g. deciding directions/datasets to focus on) as well as funding the team. @rabitt was initially in the authors list as we jointly created mirdata (the music cousin of soundata), but due to issues with her employer she had to remove herself, and she knows and is onboard with us continuing the project without her. @faroit can clarify further if this presents any issues with JOSS policies.

@hadware
Copy link

hadware commented Apr 17, 2024

All right, this clarifies it for me, i'll just tick the box for now and if @faroit thinks it isn't fair, we'll revert that.

@hagenw
Copy link

hagenw commented May 6, 2024

Review checklist for @hagenw

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/soundata/soundata?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@magdalenafuentes) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@hagenw
Copy link

hagenw commented May 6, 2024

I would propose to include Rachel Bittner to the list of authors, see soundata/soundata#163

@hagenw
Copy link

hagenw commented May 6, 2024

The installation instructions are not completely working at the moment, see soundata/soundata#153.

@faroit
Copy link

faroit commented May 6, 2024

I would propose to include Rachel Bittner to the list of authors, see soundata/soundata#163

@hagenw thanks for bringing this up again but I think we have to accepts Magdalenas comment here #6634 (comment)

@hagenw
Copy link

hagenw commented May 6, 2024

Thanks, I overlooked that. I closed the corresponding issue.

@hagenw
Copy link

hagenw commented May 6, 2024

I have finished the review. It's a very nice contribution to the audio community.

Here is a list of the issues, I would like to be tackled before accepting all points in the bullet list:

In addition, I would highly recommend to work on the following issues:

@hagenw
Copy link

hagenw commented May 17, 2024

All issues I have listed as required are now addressed, and I have marked all points as full filled in the review.

@faroit
Copy link

faroit commented May 17, 2024

@hadware can you also post some updates on your status, please?

@hadware
Copy link

hadware commented May 17, 2024

Everything seems good to me as well. I'd have liked that issue soundata/soundata#159 and soundata/soundata#160 to be addressed, but I understand it might be too much work to do in a short amount of time (and may also be out of the scope of this review). What do you think, @hagenw ?

Otherwise, i'm good, cheers to soundata's authors for their good handling of all of our comments :)

@hagenw
Copy link

hagenw commented May 18, 2024

I would also highly recommend to work on those issues, but as I understand https://joss.readthedocs.io/en/latest/reviewer_guidelines.html, especially

We like to think of JOSS as a ‘developer friendly’ journal. That is, if the submitting authors have followed best practices (have documentation, tests, continuous integration, and a license) then their review should be rapid.

I think, the current status matches already those best practices.

@guillemcortes
Copy link

Hello @hadware , @hagenw ,
Thanks for your useful comments that helped improving soundata! Regarding soundata/#159 I just want to mention that we are currently addressing it. You can check the progress here: soundata/soundata#169. Our proposal is to host all indexes to Zenodo and only have "sample" indexes packaged with soundata (similar to tests/resources files). Regarding soundata/#160, we have it in mind and it is the next item in our todo list of next developments.
Thanks to both of you!

@guillemcortes
Copy link

Hello @faroit,

We've merged soundata/soundata#169, which fixed a couple of issues raised by @hadware and @hagenw. As I understand it, we have complied with the checklists but we don't know what the next steps are. So please let us know if you need anything else from our side to move forward with the review.

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.48550/arXiv.2109.02846 is OK
- 10.5281/zenodo.3527750 is OK
- 10.48550/arXiv.1201.0490 is OK
- 10.48550/arXiv.1605.08695 is OK
- 10.5281/zenodo.4061782 is OK
- 10.3390/app6060162 is OK
- 10.48550/arXiv.2106.04624 is OK
- 10.48550/arXiv.1912.01703 is OK

MISSING DOIs

- No DOI given, and none found for title: TensorFlow Datasets: A collection of ready-to-use ...

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#5508, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@danielskatz
Copy link

There's one more small needed change I found: soundata/soundata#180

@guillemcortes
Copy link

Seen. will merge now.

One more thing, I was actually thinking now about the following paragraph:

Increase reproducibility: Provides a common framework for researchers to compare and validate their data. It also allows researchers to easily propagate dataset updates or fixes to the audio community, ensuring that methods are still comparable and researchers have the same up-to-date dataset versions. On that note, Soundata is designed to handle multiple versions of the same dataset, allowing transparent access to all versions of the dataset.

I would change "researchers" to "users" (to match the rest of the article) and also maybe substituting the second one by "them". What do you think? It will look like this:

Increase reproducibility: Provides a common framework for users to compare and validate their data. It also allows them to easily propagate dataset updates or fixes to the audio community, ensuring that methods are still comparable and users have the same up-to-date dataset versions. On that note, Soundata is designed to handle multiple versions of the same dataset, allowing transparent access to all versions of the dataset.

@danielskatz
Copy link

That sounds fine.

@guillemcortes
Copy link

Merged, can you regenerate the final proof pdf again, please?

Thanks!

@danielskatz
Copy link

@editorialbot recommend-accept

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.48550/arXiv.2109.02846 is OK
- 10.5281/zenodo.3527750 is OK
- 10.48550/arXiv.1201.0490 is OK
- 10.48550/arXiv.1605.08695 is OK
- 10.5281/zenodo.4061782 is OK
- 10.3390/app6060162 is OK
- 10.48550/arXiv.2106.04624 is OK
- 10.48550/arXiv.1912.01703 is OK

MISSING DOIs

- No DOI given, and none found for title: TensorFlow Datasets: A collection of ready-to-use ...

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#5509, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@danielskatz
Copy link

@guillemcortes - Does this seem ok to go ahead with now? Please confirm

@guillemcortes
Copy link

Hi @danielskatz , it looks good to me. I've just pinged the rest of the authors to see if they want to do a final check. Can I come back to you tomorrow or so?

@danielskatz
Copy link

sure, that's fine

@guillemcortes
Copy link

guillemcortes commented Jun 18, 2024

Hi @danielskatz ,

I just discussed it with my colleagues and we agree to move forward. Thank you very much for your review.

@danielskatz
Copy link

@editorialbot accept

@editorialbot
Copy link
Collaborator Author

Doing it live! Attempting automated processing of paper acceptance...

@editorialbot
Copy link
Collaborator Author

Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository.

If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file.

You can copy the contents for your CITATION.cff file here:

CITATION.cff

cff-version: "1.2.0"
authors:
- family-names: Fuentes
  given-names: Magdalena
  orcid: "https://orcid.org/0000-0003-4506-6639"
- family-names: Plaja-Roglans
  given-names: Genís
  orcid: "https://orcid.org/0000-0003-3450-3194"
- family-names: Cortès-Sebastià
  given-names: Guillem
  orcid: "https://orcid.org/0000-0003-2827-8955"
- family-names: Khandelwal
  given-names: Tanmay
  orcid: "https://orcid.org/0009-0004-3770-8317"
- family-names: Miron
  given-names: Marius
  orcid: "https://orcid.org/0000-0002-2563-075X"
- family-names: Serra
  given-names: Xavier
  orcid: "https://orcid.org/0000-0003-1395-2345"
- family-names: Bello
  given-names: Juan Pablo
  orcid: "https://orcid.org/0000-0001-8561-5204"
- family-names: Salamon
  given-names: Justin
  orcid: "https://orcid.org/0000-0001-6345-4593"
contact:
- family-names: Fuentes
  given-names: Magdalena
  orcid: "https://orcid.org/0000-0003-4506-6639"
doi: 10.5281/zenodo.11580085
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Fuentes
    given-names: Magdalena
    orcid: "https://orcid.org/0000-0003-4506-6639"
  - family-names: Plaja-Roglans
    given-names: Genís
    orcid: "https://orcid.org/0000-0003-3450-3194"
  - family-names: Cortès-Sebastià
    given-names: Guillem
    orcid: "https://orcid.org/0000-0003-2827-8955"
  - family-names: Khandelwal
    given-names: Tanmay
    orcid: "https://orcid.org/0009-0004-3770-8317"
  - family-names: Miron
    given-names: Marius
    orcid: "https://orcid.org/0000-0002-2563-075X"
  - family-names: Serra
    given-names: Xavier
    orcid: "https://orcid.org/0000-0003-1395-2345"
  - family-names: Bello
    given-names: Juan Pablo
    orcid: "https://orcid.org/0000-0001-8561-5204"
  - family-names: Salamon
    given-names: Justin
    orcid: "https://orcid.org/0000-0001-6345-4593"
  date-published: 2024-06-18
  doi: 10.21105/joss.06634
  issn: 2475-9066
  issue: 98
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6634
  title: "Soundata: Reproducible use of audio datasets"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06634"
  volume: 9
title: "Soundata: Reproducible use of audio datasets"

If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation.

Find more information on .cff files here and here.

@editorialbot
Copy link
Collaborator Author

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

@editorialbot
Copy link
Collaborator Author

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.06634 joss-papers#5511
  2. Wait five minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.06634
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

@editorialbot editorialbot added accepted published Papers published in JOSS labels Jun 18, 2024
@danielskatz
Copy link

Congratulations to @magdalenafuentes ([Magdalena Fuentes]), @guillemcortes, and co-authors on your publication!!

And thanks to @hagenw and @hadware for reviewing, and to @faroit for editing!
JOSS depends on volunteers, and we couldn't do this without you

@editorialbot
Copy link
Collaborator Author

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.06634/status.svg)](https://doi.org/10.21105/joss.06634)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.06634">
  <img src="https://joss.theoj.org/papers/10.21105/joss.06634/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.06634/status.svg
   :target: https://doi.org/10.21105/joss.06634

This is how it will look in your documentation:

DOI

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

@faroit
Copy link

faroit commented Jul 3, 2024

👋 @genisplaja @guillemcortes @magdalenafuentes @hagenw @hadware

I just wanted to add here that I'm super happy how this submission came out. We not just had a rather quick turnaround time but also the submission substantially improved with the help of the very detailed and timely reviews as well as how they were addressed. After all, the communication on all sides was very effective and as such, a very positive example on how JOSS is designed to be.

If we would have a best-paper/best-reviewer award, I would suggest this submission for both! Thanks a lot, everyone!

@magdalenafuentes
Copy link

👋 @faroit and all,

The submission experience was great for us too, and we're immensely grateful to you, and to @hagenw and @hadware for the great reviewing work and suggestions. Soundata will be featured in JOSSCast and we'll make sure to talk about this very positive experience 🙏

@guillemcortes
Copy link

👋 everyone,

Thank you for the kind words, @faroit. I second @magdalenafuentes words. It has been a very enriching experience and we only have good words for all of you. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review TeX Track: 7 (CSISM) Computer science, Information Science, and Mathematics
Projects
None yet
Development

No branches or pull requests

8 participants