Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements to get_bibentry #268

Closed
pitkant opened this issue Aug 16, 2023 · 4 comments · Fixed by #270
Closed

Enhancements to get_bibentry #268

pitkant opened this issue Aug 16, 2023 · 4 comments · Fixed by #270
Assignees

Comments

@pitkant
Copy link
Member

pitkant commented Aug 16, 2023

Current implementation of get_bibentry creates some non-sensical results. For example the example in the function:

my_bibliography <- get_bibentry(
    code = c("tran_hv_frtra", "t2020_rk310", "tec00001"),
    keywords = list(
      c("railways", "freight", "transport"),
      c("railways", "passengers", "modal split")
    ),
    format = "Biblatex"
  )

prints the following

> my_bibliography
@Misc{tec00001_15-08-2023,
  title = {Gross domestic product at market prices [tran_hv_frtra]},
  url = {https://ec.europa.eu/eurostat/web/products-datasets/-/tran_hv_frtra},
  language = {en},
  year = {15.08.2023},
  publisher = {Eurostat},
  author = {{Eurostat}},
  keywords = {railways, freight, transport},
  urldate = {2023-08-16},
}

@Misc{tran_hv_frtra_15-03-2023,
  title = {Volume of freight transport relative to GDP [t2020_rk310]},
  url = {https://ec.europa.eu/eurostat/web/products-datasets/-/t2020_rk310},
  language = {en},
  year = {15.03.2023},
  publisher = {Eurostat},
  author = {{Eurostat}},
  keywords = {railways, passengers, modal split},
  urldate = {2023-08-16},
}

We can spot the following things:

  • t2020_rk310 could not be downloaded as it does not exist in the TOC. No error is printed to the user, and you would think that having a non-existent item would mess up the loop that constructs the bibentry, hence the next item:
  • items tec00001_15-08-2023 and tran_hv_frtra_15-03-2023 have the right titles but the wrong codes at the end of the title square brackets: tran_hv_frtra, t2020_rk310
  • Curiously, removing t2020_rk310 from codes does not solve this problem, the codes in titles are still wrong
  • Printing a date in dd.mm.yyyy format in year field (BibLaTeX supports a date field where this would be ok whereas BibTeX supports only year) --> maybe using more precise date field internally would be the solution?
  • When outputting data in Bibtex-format, urldate-field is automatically turned into note field with text "note = {Last visited on mm/dd/yyyy}". I find the American date format a bit jarring, it would be better as dd/mm/yyyy or yyyy-mm-dd.
  • European Commission should probably be attributed somehow. Here are some examples of what DCAT-AP compliant bibliographic entries would look like, according to Documentation of data.europa.eu:

European Commission, Eurostat, 'Airport traffic data by reporting airport and airlines' (avia_tf_apal), most recent data 2021-09-01, https://ec.europa.eu/eurostat/ databrowser/view/avia_tf_apal/default/table?lang=en

European Commission, Eurostat, 'Total length of motorways' (ttr00002), accessed 2021-10-15, https://ec.europa.eu/eurostat/databrowser/view/ttr00002/default/ table?lang=en

European Commission, Eurostat, 'Real GDP growth rate -- volume' (tec00115), updated 2021-09-28, https://ec.europa.eu/eurostat/databrowser/view/tec00115/default/ table?lang=en

  • I think what's nice about the current implementation is that it links to a data landing page instead of the databrowser data view. This should be retained.
  • Are having RefManageR as a package import and BibLaTeX as an output option really necessary? For example pxweb package just utilises the base R utils::bibentry and print(utils::citation) method which would probably be what most users need. RefManageR seemed to be most useful if I wanted to insert citations in .md/.Rmd files in RStudio and manage my bibliographies there but now it seems a bit overkill for the simple purpose of printing a data citation.

I link this to the original issue where this was discussed: #128
Also maybe related: #199

sessionInfo:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] eurostat_4.0.0.9003

loaded via a namespace (and not attached):
 [1] utf8_1.2.3         generics_0.1.3     tidyr_1.3.0        class_7.3-22      
 [5] xml2_1.3.5         KernSmooth_2.23-22 stringi_1.7.12     hms_1.1.3         
 [9] digest_0.6.33      magrittr_2.0.3     countrycode_1.5.0  timechange_0.2.0  
[13] ISOweek_0.6-2      cellranger_1.1.0   rprojroot_2.0.3    plyr_1.8.8        
[17] jsonlite_1.8.7     e1071_1.7-13       backports_1.4.1    httr_1.4.6        
[21] purrr_1.0.2        fansi_1.0.4        regions_0.1.8      bibtex_0.5.1      
[25] cli_3.6.1          crayon_1.5.2       rlang_1.1.1        bit64_4.0.5       
[29] withr_2.5.0        parallel_4.3.1     tools_4.3.1        tzdb_0.4.0        
[33] dplyr_1.1.2        here_1.0.1         curl_5.0.2         assertthat_0.2.1  
[37] vctrs_0.6.3        R6_2.5.1           proxy_0.4-27       lifecycle_1.0.3   
[41] lubridate_1.9.2    classInt_0.4-9     RefManageR_1.4.0   stringr_1.5.0     
[45] bit_4.0.5          vroom_1.6.3        pkgconfig_2.0.3    pillar_1.9.0      
[49] glue_1.6.2         Rcpp_1.0.11        tibble_3.2.1       tidyselect_1.2.0  
[53] rstudioapi_0.15.0  readr_2.1.4        compiler_4.3.1     readxl_1.4.3      
@antagomir
Copy link
Member

Are having RefManageR as a package import and BibLaTeX as an output option really necessary? For example pxweb package just utilises the base R utils::bibentry and print(utils::citation) method which would probably be what most users need. RefManageR seemed to be most useful if I wanted to insert citations in .md/.Rmd files in RStudio and manage my bibliographies there but now it seems a bit overkill for the simple purpose of printing a data citation.

-> I tend to agree. It might be also useful to support similar conventions across packages in general

@pitkant
Copy link
Member Author

pitkant commented Aug 23, 2023

I sent a question to Eurostat user support and received the following instructions on how to properly cite Eurostat datasets:

Users are free to choose the method to cite Eurostat’s datasets. However, the following guidelines must be followed to reference our statistical data:

· The origin of the data should always be mentioned as “Source: Eurostat”.

· The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: “Source: Eurostat (online data code: namq_10_gdp)”

· Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

Something to take into consideration here.

@pitkant pitkant linked a pull request Aug 23, 2023 that will close this issue
@pitkant
Copy link
Member Author

pitkant commented Aug 24, 2023

Upon further inquiry from Eurostat user support I received clarification on the "guidelines that must be followed":

Please note that the three general guidelines sent have been prepared for the citation of statistical data in European Commission publications (traditional publications but also social media, website posts…) to enable the users to quickly and easily identify the origin of the data and, if needed, access the source data for themselves, while not overloading the main document with excessive detail. Therefore, unfortunately, no further details are provided for academic articles.

and then the following information that is also included in at least some eurostat package function documentation:

Eurostat does not advocate a specific way of making bibliographic citations to Eurostat data in academic articles and, unfortunately, we do not provide guidance in this regard.

Simply, what is required under our copyright notice is to mention that the source of the data is Eurostat and, where possible, to provide the link: Copyright notice and free re-use of data - Eurostat (europa.eu)

All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

  • the source is indicated as Eurostat;
  • when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information.

So it seems that Eurostat user support could not give us definitive advice.

I think returning roughly to the same practice as it was before is the safest option. I removed mention of European Commission from Eurostat citation as it was only found in data.europa.eu website and I'm not sure if Publications Office of the European Union is the final authority to define that European Commission should be attributed.

@pitkant
Copy link
Member Author

pitkant commented Dec 20, 2023

Closed with the CRAN release of package version 4.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants