Skip to content

Files

Latest commit

 

History

History
192 lines (134 loc) · 11.1 KB

pkgreview.md

File metadata and controls

192 lines (134 loc) · 11.1 KB

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7


Review Comments

This package is a great and lightweight addition to working with rdf and linked data in R. Coming after my review of the codemetar package which introduced me to linked data, I found a great learning experience into something I'm really interested in but still quite novice in so I hope my feedback helps to appreciate that POV.

Overall I feel package functionality is complete and self-contained (apart from one error discussed below). My main feedback is regarding documentation, specifically how it could be improved to help novice users to grasp the value of semantic data and better understand how the package works.

installation

The only install comment I'll add is that when I first ran install(pkg_dir, dependencies = T, build_vignettes = T), the building of the vignettes threw an error because suggests package ‘jqr’ had not been installed yet? It worked without build_vignettes = T

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD build  \
#>   '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
#>   --no-resave-data --no-manual
#> 
#> Error: Command failed (1)

with the console output:

* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd) 
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted

Installing without building the vignette results in successful installation of jqr.

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
#>   --library='/Users/Anna/Library/R/3.4/library' --install-tests
#> 
#> Installing jqr
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/RtmpbYeNu9/devtools6d3b4a7582a1/jqr'  \
#>   --library='/Users/Anna/Library/R/3.4/library' --install-tests
#> 

if jqr is installed, installation and vignette building proceeds successfully.

tests and checks

All OK

documentation

My main suggestion is to try to define some terms and improve the concept map for the tools by adding some detail and broader context to the documentation. The following suggestions could also be addressed with links to further details if you think they are too superfluous for explicit documentation with the package.

  • a brief intro to the semantic could be useful (eg something like):

    The semantic web aims to link data in a machine readable way through the web, making data more alignable and interoperable, much easier to search, enriching and compute on.

  • what a graph format for data is (eg triples etc).

  • the structure of an rdf S3 object (ie you introduced some aspects of the data format here: (user does not have to manage world, model and storage objects by default just to perform standard operations and conversions) which we are told we can ignore (which is great) but actually creates more questions... what is this mysterious "world" object that forms an opaque slot of an rdf S3 object?) Would be nice to explain the structure of the S3 rdf briefly. Is there usefull metadata that can be extracted from the structure? (see comment later)

  • rdf file formats. I think its would especially aid in appreciating the rdf_serialise function to expand briefly (and potentially signpost to a resource like this) on the various serialization formats, perhaps even why one would use one over another, and particularly, why serialization involves writing a file out. I feel these are important concepts to help appreciate use cases of the function. Indeed the file out aspect of the function could do with being flagged more prominently in function man page where just by looking at the (somewhat jargony if you don't know what serialization is) description and running the example, you've ended up writing a file without realising.

  • Similarly, **parsing can then be seen/described as reading in/encoding an rdf from their specific string formats.

Spelling a few things out in plain english could really help folks follow what's going better and understand what file types are inputs or outputs of different functions.

how do I find info on URIs?

Some signposting/guidance on how I can find information on the semantics dictating what information I can extract from an rdf object would be really useful. eg. with a df or list you could use str to get an idea of how you could start indexing these objects. If confronted with a local rdf file, how would one go about figuring out even what they can query? I appreciate this is really one of the difficulties of working with rdf and semantic data in general (the flipside to the ease of being able to make unstructured queries is that we need to know how data are labelled) but I feel some brief guidance or demo on how one would approach this would go a long way.

examples in general

For clarity to the reader who may not have looked at function documentation yet, I recommend using the full argument names when supplying arguments to functions (if not always atleast the first time an argument is introduced) in vignettes.

SPARQL queries to JSON data section

At the end of the intro to the section, you write:

Here is a query that for all papers where I am an author, returns a table of given name, family name and year of publication:

Am I right in thinking though that you are co-author on all papers in the rdf but the query is in fact filtering the names of your co-authors? (through FILTER ( ?coi_family != "Boettiger" ))

Turning RDF-XML into more friendly JSON

It would be nice if possible to see sample of print outs of the covenrsion of the different files or at least of the effect of compaction.

rdf_add man page

Would be nice to see a demo of using one or more of the additonal arguments.

Motivating example

I think an additonal, more detailed motivating example might illustrate more direct use case in a researchers workflow. In particular it would be good to highlight the great potential of triplestore APIs (and celebrate the efforts of many cool eg governmental linked data initiatives). So an example that incorporates a query to a triplestore and then enrichment of a researchers data could be a cool example. This could be a longer term project opr even just an rOpenSci blogpost but see comment re: rdf_query function below.

functionality

  • Serialising to turtle or trig throws an error
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.turtle", format = "turtle")
#> librdf error - serializer 'turtle' not found
#> rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of #> type librdf_serializer is NULL.
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.trig", format = "trig")
#> librdf error - serializer 'trig' not found
#> rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of #> type librdf_serializer is NULL.
  • In rdf_query, ss there a way to return a non regularised query result ie return an rdf instead? I'm thinking about a usecase when maybe it's better to enrich data by merging rdfs? ie, researcher queries a triples store through an API (yeyyy open data!), combines their not fully matching but interoperable rdf data with rdf_add (ie try to show how triplestore is better than tabular non-linked data for merging) and then queries the merged rdf to extract an enrched analytical tabular dataset?

Tests

Add tests for being able to serialise to trig and turtles which at the moment is throwing an error. Perhaps a test for parsing/serialising each format would be good. Also, perhaps worth checking whether eg rdf_parse(format="turtle") is working.