-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New tool: Required publication references #236
Comments
I would also add the versions of each tools too |
It might be good to host a central database (e.g. yaml) of tools and their associated information. This can then be used to parse the conda yaml to create a tool specific publication description that would be linked by release to the pipeline. It would be much neater to just reference the pipeline in papers (if morally possible) - with a sentence pointing to the pipeline for all the tool-specific citations. I've often been asked to trim down text and a decision may need to be made as to which tools you cite... I generally provide a short description of the tool, version, reference and pubmed id. Maybe we can provide this as a file that gets bundled with the pipeline that can be linked on the pipeline home page? |
Hm, I was just thinking that we get this information for free over the Anaconda API, right? For example: Although package maintainers do not always provide all fields info (which is bad!). So instead of having another yaml file, we could use the |
We might want to add extra informations, like an actual publication or DOI for the pipeline |
hm, i see. There is no such thing as a tool registry with DOI and publication URIs, right? Maybe we need this... |
I see where you're going with this, however I quite like that all pipelines are totally self-sufficient currently. Especially if this will be used within tool execution, as many users run offline.
I don't think that it is morally good to do this. If people decide that they need to do this then that can be on their shoulders, but I don't think that we should help them.
Yes - this is basically the information that I was thinking of listing (though DOI instead of pubmed). A table with this information would be a nice output option too though..
Yes, that could be very nice actually. We have an |
Not really - we're already using this for the Tying the names in with |
Maybe should activate this discussion again: nextflow-io/nextflow#866 Tools and parameters that are used in Nextflow should be descripted in a structured way, so humans and machines can work with it. I also see the tools metadata such as URI, URL, description and parameters there combined... Just brain-storming here. |
How about tool-specific parameters? e.g. if you aren't using the defaults. I generally provide these as a double-quoted string for full traceability and reproducibility. Would it be enough to have these defined within |
Yes, I wondered about putting this kind of information alongside the parameter schema described in that issue. However, parameters and tool metadata are distinct, so it may not make sense. For example, it could break parsing by the general tools form-building tools discussed on that thread. A section of
This is getting a bit off-topic now 😅But yes, I think having them defined in |
IMO maintaining a separate annotation file does not work because very easily it gets out of sync with the actual tools used in the pipeline script. Ideally these info should be inferred during process execution nextflow-io/nextflow#879. Alternatively we could add an annotation in the module/process definition nextflow-io/nextflow#984. Otherwise the best approximation could be the Conda environment file tho, if I'm understanding well, the problem is that it does not include the citation/paper DOI, right? Not sure but I think using the tool name and version it should be possible to infer the related metadata from biotools. Pinging @bgruening and @ypriverol who should know about the state of the art of bioconda/containers /biotools interoperability. |
@ewels @pditommaso we actually do include identifier into conda, see here: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/multiqc/meta.yaml#L137 This means you can infer this from the conda package or bio.tools. A DOI can, and should, be added to the conda package as well. Does this answer your quesiton? |
Uh, this is actually very nice. Just checked the API request for fasttree: Seems that we get the information we need from it, so no need to have an additional file. |
Fantastic - this is is great news! Many thanks @bgruening - I didn't know that this lookup existed. Any ideas on how we can best fetch this information? If we can it would be great to use this method. If we want, we could even get the linter to warn if the biotools identifier is missing.
Also under the identifiers section, as done here I guess? Cool! I'll add this to the MultiQC recipe. |
Short answer is that its part of the tarball and with this part of the installation, afaik.
Yes :) |
ok cool, thanks! Then I wonder if the best bet is to just try pinging the biotools API with the conda package name if it's in the bioconda channel. I guess that the two will essentially always be the same.. This won't match up versions and could in some weird edge cases give the wrong information, so not ideal. But I don't really fancy downloading and extracting all software just for this fast little utility command. |
..could also just grab the raw bioconda |
This depends if you always have internet access during the workflow run. I guess querying the API is ok. I suppose digging the information out of conda is also easy - which should be already available locally. |
Ah true, there are two different use cases here. I was thinking primarily about a new For using the data within a workflow run (eg. saving it to an |
..but we'd still need an internet connection for bio.tools. I think that this needs to be a separate cli tool. If we want the output as a results file with the pipeline then this should probably be a static file which is saved separately I think. If we want automation, the lint tool could check that it exists and is up to date (maybe on |
Have a look at |
This issue is getting much more manageable with DSL2 modules, where we have a meta file for each tool that includes DOI 🎉 (typically taken from Bioconda). This could potentially be used both for a command line tool but also within pipelines as the meta file should be bundled within each pipeline. |
Following on from: #2326 (which starts providing a framework to insert this into a MultiQC report): @maxulysse and @mashehu have both said we should automate this even more and should be possible via the DOIs in the From @maxulysse a conceptual plan:
Initial problems I see:
|
It would be nice to make it easier for people to know what should be referenced if they use a pipeline in a manuscript. For example,
nf-core references <pipeline-name>
could return a list of the references that you need to add into your paper. (alt names:nf-core refs
,nf-core bib
..?)Different flags could give different output formats, but perhaps the default could be prose text. For example:
Need to think about where and how to capture this information in the pipeline files. For example, a simple YAML file could work nicely:
Requirements:
Output options could be:
The nextflow and nf-core references can be hardcoded. The workflow DOI can be lifted from
README.md
I guess. Or could potentially be added as a newworkflow.metadata
variable?Thoughts / feedback?
Phil
The text was updated successfully, but these errors were encountered: