Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to API v4 #57

Merged
merged 31 commits into from
Sep 30, 2024
Merged

Update to API v4 #57

merged 31 commits into from
Sep 30, 2024

Conversation

willgearty
Copy link
Contributor

@willgearty willgearty commented Sep 13, 2024

This is a work-in-progress PR to update the package to the new version (v4) of the Red List API.

Note that this new version of the API is a HUGE rework and the entire API is now designed around assessments rather than the last version which was mostly designed around species. I think this should still be fine for most people because you can still get all of the species for a given habitat, for example, but you'll just get multiple assessments/species which you'll then need to clean up (alternatively, there could be a latest argument for any assessment summary function which only returns the latest assessments and I believe should return only one assessment/species).

I've implemented about half-or-so of the endpoints so far and anticipate implementing the rest within the week. UPDATE: All endpoints are now implemented.

Here's the full TO-DO:

  • Implement all API endpoints
  • Update/add tests (93% coverage, woot!)
  • Review implementation and fix anything else
  • Rewrite vignette
  • Beta testing

Fixes #43, Fixes #48, Fixes #52, Fixes #55, Fixes #58

@jeffreyhanson
Copy link

Hi,

Thanks again for all your work on this!

I manually tested some of the functions and all the tests passed on my computer (note that I deleted testthat/fixtures prior to running the tests to ensure it would query the API). Also, the package passed all checks on my computer too. Let me know if there's some particular function or functionality you would like me to test? I've included my session info below if useful.

I was wondering if it would be possible to implement a function to query species' assesments using their identifiers (i.e., a function for the /api/v4/taxa/sis/{sid_id} API on Taxa on the API documentation)? This would be very useful for me because my workflows often involve downloading spatial data from the ICUN Red list (which provides species sis_id identifiers) and then using the rredlist pacakge to obtain additional information for them.

I'm happy to prepare a PR if that would be useful?


R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Pacific/Auckland
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rredlist_0.7.1.9000 testthat_3.2.1.1    devtools_2.4.5     
[4] usethis_3.0.0      

loaded via a namespace (and not attached):
 [1] utf8_1.2.4        xml2_1.3.6        stringi_1.8.4     httpcode_0.3.0   
 [5] digest_0.6.37     magrittr_2.0.3    pkgload_1.4.0     fastmap_1.2.0    
 [9] rprojroot_2.0.4   jsonlite_1.8.8    processx_3.8.4    pkgbuild_1.4.4   
[13] sessioninfo_1.2.2 brio_1.1.5        rcmdcheck_1.4.0   crul_1.5.0       
[17] ps_1.8.0          urlchecker_1.0.1  promises_1.3.0    purrr_1.0.2      
[21] fansi_1.0.6       brew_1.0-10       cli_3.6.3         shiny_1.9.1      
[25] rlang_1.1.4       commonmark_1.9.1  ellipsis_0.3.2    remotes_2.5.0    
[29] withr_3.0.1       cachem_1.1.0      tools_4.4.1       memoise_2.0.1    
[33] httpuv_1.6.15     curl_5.2.2        vctrs_0.6.5       R6_2.5.1         
[37] mime_0.12         lifecycle_1.0.4   stringr_1.5.1     fs_1.6.4         
[41] htmlwidgets_1.6.4 xopen_1.0.1       miniUI_0.1.1.1    callr_3.7.6      
[45] pkgconfig_2.0.3   desc_1.4.3        pillar_1.9.0      later_1.3.2      
[49] glue_1.7.0        profvis_0.3.8     Rcpp_1.0.13       xfun_0.47        
[53] tibble_3.2.1      rstudioapi_0.16.0 knitr_1.48        xtable_1.8-4     
[57] htmltools_0.5.8.1 compiler_4.4.1    prettyunits_1.2.0 roxygen2_7.3.2

@willgearty
Copy link
Contributor Author

Thanks for playing around with the new version of the package @jeffreyhanson, I really appreciate it! Not sure how I missed the taxa/sis endpoint, but that is now covered with the new function rl_sis(). Please let me know if there is any other functionality missing or if anything else behaves unlike what you'd expect or prefer.

@jeffreyhanson
Copy link

Thanks for the quick response! Yeah, I just tested it and the new rl_sis() function works great!

I think that provides all the functionality I'd need, but there might be some opportunities to add some quality of features if you wanted to? For example, I think most people will only want/need the latest asssesment for a particular species. So if you wanted to help make the package easier to work with, you could have (i) a function that accepts a species name and genus, finds the latest assesment for it, and then returns the assesment info, and (ii) a function that accepts a species id, find the latest assesment for it, and then returns the assesment info? Just an idea. I know I'll need to implement a function like this in my own work, so it'd be handy if there was a function in the package did this for me?

I suppose it would be nice if there were some wrapper functions that could provide backwards compatibility with the current (or soon to be previous version) of the package? However, my impression is that would be a lot of work, so probably not worth it? Especially since many of the functions in the current version have a region parameter that might need to be implemented manually?

@willgearty
Copy link
Contributor Author

Thanks @jeffreyhanson, that's all super helpful! I've put together some quick wrappers to return the full version of the latest assessment for a given SIS ID or species name (rl_sis_latest() and rl_species_latest()). I also added support for the latest, year_published, and scope_code filters for all of the endpoints that support them. I checked the old documentation, and it looks like this scope_code filter is basically the same as the old region filter, so I've added some documentation about this.

@jeffreyhanson
Copy link

jeffreyhanson commented Sep 29, 2024

Sorry for my slow response. I just tried out the new functions for getting the latest assesments for a given species, and they work perfectly. Thanks so much for implementing them! Also, great work supporting those scopes well!

I'm not sure if this is unnecessary, but is it worth adding some logic to handle situations where none of the assesments are listed as latest? This is because it seems that the "latest" information might contain mistakes, since it seems that multiple assesments may be listed as latest (based on your recent commit)?

For example, instead of this:

  tmp <- rl_sis(id, key, ...)$assessments
  tmp_sub <- subset(tmp, tmp$latest)
  tmp_sub <- tmp_sub[order(tmp_sub$year_published, decreasing = TRUE), ]
  rl_assessment(id = tmp_sub$assessment_id[1], key = key, parse = parse, ...)

You could do something like this:

  tmp <- rl_sis(id, key, ...)$assessments
  if (any(tmp$latest, na.rm = TRUE)) {
    tmp_sub <- subset(tmp, tmp$latest)
  }
  tmp_sub <- tmp_sub[order(tmp_sub$year_published, decreasing = TRUE), ]
  rl_assessment(id = tmp_sub$assessment_id[1], key = key, parse = parse, ...)

Also, I noticed that the "year_published" column has years stored in a character format. I'm not super knowledgeable about character encoding, but is it possible that sorting the table based on this column without converting it to a numeric might result in an unexpected ordering? Also, if users are using older versions of R (where stringsAsFactors = TRUE by default), I wonder if this could also cause issues with unexpected orderings? To help avoid potential issues, I wonder if something like this would be useful?

  tmp_sub$year_published <- as.numeric(as.character(tmp_sub$year_published))
  tmp_sub <- tmp_sub[order(tmp_sub$year_published, decreasing = TRUE), , drop = FALSE]

Hope that helps? But maybe I'm just being overly paranoid about issues that might never occur?

@willgearty
Copy link
Contributor Author

Thanks @jeffreyhanson, I've implemented both of those great suggestions!

@willgearty willgearty merged commit b825de4 into master Sep 30, 2024
@willgearty willgearty deleted the api_v4 branch September 30, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants