Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CITATION.cff Citation File Format file to repo #1541

Closed
matthewfeickert opened this issue Jul 31, 2021 · 11 comments · Fixed by #1551
Closed

Add CITATION.cff Citation File Format file to repo #1541

matthewfeickert opened this issue Jul 31, 2021 · 11 comments · Fixed by #1551
Assignees
Labels
docs Documentation related

Comments

@matthewfeickert
Copy link
Member

matthewfeickert commented Jul 31, 2021

Description

@lukasheinrich @kratsg GitHub now will add a citation button to repositories that have a Citation File Format file (CITATION.cff) on the default branch. This seems pretty good in general for citation of software. Also Zenodo has now added support for CITATION.cff for linked GitHub repositories, which means that we can probably deprecate our .zenodo.json in favor of a CITATION.cff.

However, it seems that it is now a smooth replacement in all areas, especially as pyhf is trying to be careful about how people cite the software so that we get consistent citations and that people cite the JOSS paper DOI at the same time.

On my fork of pyhf I've been playing around with the CFF file format and the following

cff-version: 1.1.0
message: "Please cite the following works when using this software."
type: software
authors:
- family-names: "Heinrich"
  given-names: "Lukas"
  orcid: "https://orcid.org/0000-0002-4048-7584"
- family-names: "Feickert"
  given-names: "Matthew"
  orcid: "https://orcid.org/0000-0003-4124-7862"
- family-names: "Stark"
  given-names: "Giordon"
  orcid: "https://orcid.org/0000-0001-6616-3433"
title: "scikit-hep/pyhf: v0.6.2"
version: 0.6.2
doi: 10.5281/zenodo.1169739
repository-code: "https://github.com/scikit-hep/pyhf"
url: "https://pyhf.readthedocs.io/en/v0.6.2/"
keywords:
  - python
  - physics
  - statistics
  - fitting
  - scipy
  - numpy
  - tensorflow
  - pytorch
  - jax
  - auto-differentiation
license: "Apache-2.0"
references:
  - type: article
    authors:
    - family-names: "Heinrich"
      given-names: "Lukas"
      orcid: "https://orcid.org/0000-0002-4048-7584"
    - family-names: "Feickert"
      given-names: "Matthew"
      orcid: "https://orcid.org/0000-0003-4124-7862"
    - family-names: "Stark"
      given-names: "Giordon"
      orcid: "https://orcid.org/0000-0001-6616-3433"
    - family-names: "Cranmer"
      given-names: "Kyle"
      orcid: "https://orcid.org/0000-0002-5769-7094"
    title: "pyhf: pure-Python implementation of HistFactory statistical models"
    doi: 10.21105/joss.02823
    url: "https://doi.org/10.21105/joss.02823"
    year: 2021
    publisher: The Open Journal
    volume: 6
    number: 58
    pages: 2823
    journal: Journal of Open Source Software

produces a window like (notice that the message seems to get overwritten by GitHub)

example_box

and the copied citation gives

@misc{Heinrich_scikitheppyhf_v0.6.2_2021,
author = {Heinrich, Lukas and Feickert, Matthew and Stark, Giordon},
doi = {10.5281/zenodo.1169739},
month = {6},
title = {scikit-hep/pyhf: v0.6.2},
url = {https://github.com/scikit-hep/pyhf},
year = {2021}
}

which is a bit different from our preferred citation (of just the software) of

@software{pyhf,
author = {Lukas Heinrich and Matthew Feickert and Giordon Stark},
title = "{pyhf: v0.6.2}",
version = {0.6.2},
doi = {10.5281/zenodo.1169739},
url = {https://doi.org/10.5281/zenodo.1169739},
note = {https://github.com/scikit-hep/pyhf/releases/tag/v0.6.2}
}

and doesn't get the reference citation for the JOSS paper

@article{pyhf_joss,
doi = {10.21105/joss.02823},
url = {https://doi.org/10.21105/joss.02823},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {58},
pages = {2823},
author = {Lukas Heinrich and Matthew Feickert and Giordon Stark and Kyle Cranmer},
title = {pyhf: pure-Python implementation of HistFactory statistical models},
journal = {Journal of Open Source Software}
}

at all.

There's also no CFF validator that I know of as well that will check that everything is valid. Given that we've got our .zenodo.json config file setup so that we know that it works (thanks to help from @lnielsen) we should be careful to not break Zenodo with this new format.

Example libraries that are using the CITATION.cff now

Similar ongoing discussion on the topic

cc @danielskatz @cranmer

Other related references

@agoose77
Copy link
Contributor

agoose77 commented Aug 9, 2021

@matthewfeickert RE the need for a validator, we can use the JSON schema with an online tool, e.g.:
https://www.jsonschemavalidator.net/s/FrEm3KE2

@matthewfeickert matthewfeickert self-assigned this Aug 9, 2021
@matthewfeickert
Copy link
Member Author

@matthewfeickert RE the need for a validator, we can use the JSON schema with an online tool, e.g.:
https://www.jsonschemavalidator.net/s/FrEm3KE2

True, though ideally you shouldn't have to be doing copypaste to validate schemas, but be able to validate in CI.

@agoose77
Copy link
Contributor

agoose77 commented Aug 9, 2021

@matthewfeickert RE the need for a validator, we can use the JSON schema with an online tool, e.g.:
https://www.jsonschemavalidator.net/s/FrEm3KE2

True, though ideally you shouldn't have to be doing copypaste to validate schemas, but be able to validate in CI.

If we need a tool to do this, perhaps we could leverage jsonschema, i.e.

jsonschema <(curl "https://citation-file-format.github.io/1.2.0/schema.json") -i /tmp/schema.json

@matthewfeickert
Copy link
Member Author

If we need a tool to do this, perhaps we could leverage jsonschema, i.e.

jsonschema <(curl "https://citation-file-format.github.io/1.2.0/schema.json") -i /tmp/schema.json

Yeah the v1.2.0 CHANGELOG makes this nicer

switched from YAML schema to JSON schema

@matthewfeickert
Copy link
Member Author

Hm. It seems to be having trouble with some aspects of publisher if you add in references

...
  "references": [
    {
      "type": "article",
      "authors": [
        {
          "family-names": "Heinrich",
          "given-names": "Lukas",
          "orcid": "https://orcid.org/0000-0002-4048-7584"
        },
        {
          "family-names": "Feickert",
          "given-names": "Matthew",
          "orcid": "https://orcid.org/0000-0003-4124-7862"
        },
        {
          "family-names": "Stark",
          "given-names": "Giordon",
          "orcid": "https://orcid.org/0000-0001-6616-3433"
        },
        {
          "family-names": "Cranmer",
          "given-names": "Kyle",
          "orcid": "https://orcid.org/0000-0002-5769-7094"
        }
      ],
      "title": "pyhf: pure-Python implementation of HistFactory statistical models",
      "doi": "10.21105/joss.02823",
      "url": "https://doi.org/10.21105/joss.02823",
      "year": 2021,
      "publisher": "The Open Journal",
      "volume": 6,
      "number": 58,
      "pages": 2823,
      "journal": "Journal of Open Source Software"
    }
  ]
}
$ jsonschema <(curl -sL "https://citation-file-format.github.io/1.2.0/schema.json") -i cff-schema.json 
The Open Journal: 'The Open Journal' is not of type 'object'

@matthewfeickert
Copy link
Member Author

which as @kratsg pointed out to me could be solved with

      "publisher": {"name": "The Open Journal"},

@matthewfeickert
Copy link
Member Author

@agoose77 Any thoughts here? I'm working on my fork so that I can preview the result https://github.com/matthewfeickert/pyhf/blob/master/CITATION.cff

but if I download that

$ curl -sLO https://raw.githubusercontent.com/matthewfeickert/pyhf/6423b74f8d52a7bc145fff38786761c1b617651b/CITATION.cff
$ python -m pip install yq

and run

$ jsonschema <(curl -sL "https://citation-file-format.github.io/1.2.0/schema.json") --instance <(cat CITATION.cff | yq)
[{'name': 'The Open Journal'}]: [{'name': 'The Open Journal'}] is not of type 'object'

so how can I get

"publisher": {"name": "The Open Journal"},

instead of

"publisher": [{"name": "The Open Journal"}],

?

@matthewfeickert
Copy link
Member Author

Oh whoops. I had

    publisher:
    - name: "The Open Journal"

which gives

      "publisher": [
        {
          "name": "The Open Journal"
        }
      ],

while

    publisher:
      name: "The Open Journal"

works giving

      "publisher": {
        "name": "The Open Journal"
      },

YAML is annoying at times.

@matthewfeickert
Copy link
Member Author

From the pyhf v0.6.3 release we learned that the CITATION.cff file will cause Zenodo to place the abstract and the message fields in the "description" entry of the archive

pyhf/CITATION.cff

Lines 34 to 42 in 248e400

abstract: |
The HistFactory p.d.f. template is per-se independent of its implementation
in ROOT and it is useful to be able to run statistical analysis outside of
the ROOT, RooFit, RooStats framework. pyhf is a pure-python implementation
of that statistical model for multi-bin histogram-based analysis and its
interval estimation is based on the asymptotic formulas of "Asymptotic
formulae for likelihood-based tests of new physics". pyhf supports modern
computational graph libraries such as TensorFlow, PyTorch, and JAX in order
to make use of features such as autodifferentiation and GPU acceleration.

message: "Please cite the following works when using this software."

What we'd like is to have the current description field from the .zenodo.json to be used

"description": "pure-Python HistFactory implementation with tensors and autodiff",

but there isn't a description field in CITAION.cff.

CITAION.cff will also flips the names of the authors

names_backwards

which isn't great.

@lnielsen Any thoughts on how we can avoid this? Or would you prefer we open an Issue on Zenodo?

@lnielsen
Copy link
Contributor

lnielsen commented Sep 7, 2021

@matthewfeickert The best is you report it on the Zenodo support line: https://zenodo.org/support

@matthewfeickert
Copy link
Member Author

Thanks Lars!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants