Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: au_orcid is of type logical when all authors have no ORCID (should be #220

Open
rkrug opened this issue Mar 21, 2024 · 2 comments
Open
Labels
bug Something isn't working priority: high

Comments

@rkrug
Copy link

rkrug commented Mar 21, 2024

Hi

there is a bug in the conversion from the oa results to a data.frame / tibble. When all authors of a work do not have an ORCID, the column au_orcid is of type 'logical;' while the others are as expected are of type character`. This is causing problems, as I want to save these as parquet files which does not work if the objects are of different type.

Probably replacing NA`` with as.character(NA)` in the appropriate places would fix this issue. I assume the same can occur in other fields.

Thanks,

Rainer

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       2 obs. of  6 variables:
 $ id              : chr  "https://openalex.org/W2101204002" "https://openalex.org/W2159758382"
 $ author          :List of 2
  ..$ :'data.frame':    7 obs. of  11 variables:
  .. ..$ au_id                   : chr  "https://openalex.org/A5066931706" "https://openalex.org/A5047672302" "https://openalex.org/A5006051784" "https://openalex.org/A5037636565" ...
  .. ..$ au_display_name         : chr  "Philippe Cury" "Andrew Bakun" "Robert J. M. Crawford" "Astrid Jarre" ...
  .. ..$ au_orcid                : chr  NA "https://orcid.org/0000-0002-4366-3846" NA "https://orcid.org/0000-0002-0690-6183" ...
  .. ..$ author_position         : chr  "first" "middle" "middle" "middle" ...
  .. ..$ au_affiliation_raw      : chr  "Institut de Recherche pour le Développement (IRD), Marine and Coastal ManagementPrivate Bag X2, Rogge Bay 8012,"| __truncated__ "University of Cape Town, Department of Oceanography7701 Rondebosch, South Africa" "Marine and Coastal ManagementPrivate Bag X2, 8012 Rogge Bay, Cape Town, South Africa" "Danish Institute for Fisheries Research, North Sea CentrePO Box 101, 9850 Hirtshals, Denmark" ...
  .. ..$ institution_id          : chr  "https://openalex.org/I157614274" "https://openalex.org/I157614274" NA NA ...
  .. ..$ institution_display_name: chr  "University of Cape Town" "University of Cape Town" NA NA ...
  .. ..$ institution_ror         : chr  "https://ror.org/03p74gp79" "https://ror.org/03p74gp79" NA NA ...
  .. ..$ institution_country_code: chr  "ZA" "ZA" NA NA ...
  .. ..$ institution_type        : chr  "education" "education" NA NA ...
  .. ..$ institution_lineage     : chr  "https://openalex.org/I157614274" "https://openalex.org/I157614274" NA NA ...
  ..$ :'data.frame':    2 obs. of  11 variables:
  .. ..$ au_id                   : chr  "https://openalex.org/A5023041174" "https://openalex.org/A5028708328"
  .. ..$ au_display_name         : chr  "Wilfred M. Post" "Kyeol Kwon"
  .. ..$ au_orcid                : logi  NA NA
  .. ..$ author_position         : chr  "first" "last"
  .. ..$ au_affiliation_raw      : chr  "Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6335, USA;" "Chemical Engineering Department, Tuskeegee University, Tuskeegee, AL 36088, USA"
  .. ..$ institution_id          : chr  "https://openalex.org/I1289243028" "https://openalex.org/I6026837"
  .. ..$ institution_display_name: chr  "Oak Ridge National Laboratory" "Tuskegee University"
  .. ..$ institution_ror         : chr  "https://ror.org/01qz5mb56" "https://ror.org/0137n4m74"
  .. ..$ institution_country_code: chr  "US" "US"
  .. ..$ institution_type        : chr  "facility" "education"
  .. ..$ institution_lineage     : chr  "https://openalex.org/I1289243028, https://openalex.org/I39565521, https://openalex.org/I4210159294" "https://openalex.org/I6026837"
 $ ab              : chr  "In upwelling ecosystems, there is often a crucial intermediate trophic level, occupied by small, plankton-feedi"| __truncated__ "Summary When agricultural land is no longer used for cultivation and allowed to revert to natural vegetation or"| __truncated__
 $ publication_year: int  2000 2000
 $ doi             : chr  "https://doi.org/10.1006/jmsc.2000.0712" "https://doi.org/10.1046/j.1365-2486.2000.00308.x"
 $ page            : num  1 1
@trangdata
Copy link
Collaborator

A quick fix would be to add a parameter for replace_w_na:

replace_w_na <- function(x, y = NA){
  lapply(x, `%||%`, y = y)
}

and then on this line

prepend(replace_w_na(l$author), "au")

do

prepend(replace_w_na(l$author, NA_character_), "au")

But you're right, this can happen in other fields as well. rbind.data.frame automatically converts the NA to whichever type of the non-NAs, but if all values are NA then they do remain of type logical. I'll wait to see if others can chime in.

@rkrug
Copy link
Author

rkrug commented Mar 26, 2024

Is there any chance that you could take a look at this? The easiest might be to define a template with the expected types, and then use these? I have only an extremely slow fix running to correct this and I am working with millions of records...
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: high
Projects
None yet
Development

No branches or pull requests

2 participants