Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
The package includes all the following forms of documentation:
- A statement of need clearly stating problems the software is designed to solve and its target audience in README
- Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s) demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions in R help
- Examples for all exported functions in R Help that run successfully locally
- Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
- The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- A short summary describing the high-level functionality of the software
- Authors: A list of authors with their affiliations
- A statement of need clearly stating problems the software is designed to solve and its target audience.
- References: with DOIs for all those that have one (e.g. papers, datasets, software).
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
- The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 7
This package is a great and lightweight addition to working with rdf
and linked data in R. Coming after my review of the codemetar package which introduced me to linked data, I found a great learning experience into something I'm really interested in but still quite novice in so I hope my feedback helps to appreciate that POV.
Overall I feel package functionality is complete and self-contained (apart from one error discussed below). My main feedback is regarding documentation, specifically how it could be improved to help novice users to grasp the value of semantic data and better understand how the package works.
The only install comment I'll add is that when I first ran install(pkg_dir, dependencies = T, build_vignettes = T)
, the building of the vignettes threw an error because suggests package ‘jqr’
had not been installed yet? It worked without build_vignettes = T
pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
#> --no-environ --no-save --no-restore --quiet CMD build \
#> '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' \
#> --no-resave-data --no-manual
#>
#> Error: Command failed (1)
with the console output:
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd)
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted
Installing without building the vignette results in successful installation of jqr
.
pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
#> --no-environ --no-save --no-restore --quiet CMD INSTALL \
#> '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' \
#> --library='/Users/Anna/Library/R/3.4/library' --install-tests
#>
#> Installing jqr
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
#> --no-environ --no-save --no-restore --quiet CMD INSTALL \
#> '/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/RtmpbYeNu9/devtools6d3b4a7582a1/jqr' \
#> --library='/Users/Anna/Library/R/3.4/library' --install-tests
#>
if jqr
is installed, installation and vignette building proceeds successfully.
All OK
My main suggestion is to try to define some terms and improve the concept map for the tools by adding some detail and broader context to the documentation. The following suggestions could also be addressed with links to further details if you think they are too superfluous for explicit documentation with the package.
-
a brief intro to the semantic could be useful (eg something like):
The semantic web aims to link data in a machine readable way through the web, making data more alignable and interoperable, much easier to search, enriching and compute on.
-
what a graph format for data is (eg triples etc).
-
the structure of an
rdf
S3 object (ie you introduced some aspects of the data format here:(user does not have to manage world, model and storage objects by default just to perform standard operations and conversions)
which we are told we can ignore (which is great) but actually creates more questions... what is this mysterious "world" object that forms an opaque slot of an rdf S3 object?) Would be nice to explain the structure of the S3 rdf briefly. Is there usefull metadata that can be extracted from the structure? (see comment later) -
rdf
file formats. I think its would especially aid in appreciating therdf_serialise
function to expand briefly (and potentially signpost to a resource like this) on the various serialization formats, perhaps even why one would use one over another, and particularly, why serialization involves writing a file out. I feel these are important concepts to help appreciate use cases of the function. Indeed the file out aspect of the function could do with being flagged more prominently in function man page where just by looking at the (somewhat jargony if you don't know what serialization is) description and running the example, you've ended up writing a file without realising. -
Similarly, **parsing can then be seen/described as reading in/encoding an
rdf
from their specific string formats.
Spelling a few things out in plain english could really help folks follow what's going better and understand what file types are inputs or outputs of different functions.
Some signposting/guidance on how I can find information on the semantics dictating what information I can extract from an rdf
object would be really useful. eg. with a df
or list
you could use str
to get an idea of how you could start indexing these objects. If confronted with a local rdf
file, how would one go about figuring out even what they can query? I appreciate this is really one of the difficulties of working with rdf
and semantic data in general (the flipside to the ease of being able to make unstructured queries is that we need to know how data are labelled) but I feel some brief guidance or demo on how one would approach this would go a long way.
For clarity to the reader who may not have looked at function documentation yet, I recommend using the full argument names when supplying arguments to functions (if not always atleast the first time an argument is introduced) in vignettes.
At the end of the intro to the section, you write:
Here is a query that for all papers where I am an author, returns a table of given name, family name and year of publication:
Am I right in thinking though that you are co-author on all papers in the rdf but the query is in fact filtering the names of your co-authors? (through FILTER ( ?coi_family != "Boettiger" )
)
It would be nice if possible to see sample of print outs of the covenrsion of the different files or at least of the effect of compaction.
Would be nice to see a demo of using one or more of the additonal arguments.
I think an additonal, more detailed motivating example might illustrate more direct use case in a researchers workflow. In particular it would be good to highlight the great potential of triplestore APIs (and celebrate the efforts of many cool eg governmental linked data initiatives). So an example that incorporates a query to a triplestore and then enrichment of a researchers data could be a cool example. This could be a longer term project opr even just an rOpenSci blogpost but see comment re: rdf_query
function below.
- Serialising to
turtle
ortrig
throws an error
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.turtle", format = "turtle")
#> librdf error - serializer 'turtle' not found
#> rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of #> type librdf_serializer is NULL.
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.trig", format = "trig")
#> librdf error - serializer 'trig' not found
#> rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of #> type librdf_serializer is NULL.
- In
rdf_query
, ss there a way to return a non regularised query result ie return anrdf
instead? I'm thinking about a usecase when maybe it's better to enrich data by mergingrdf
s? ie, researcher queries a triples store through an API (yeyyy open data!), combines their not fully matching but interoperablerdf
data withrdf_add
(ie try to show how triplestore is better than tabular non-linked data for merging) and then queries the mergedrdf
to extract an enrched analytical tabular dataset?
Add tests for being able to serialise to trig
and turtles
which at the moment is throwing an error.
Perhaps a test for parsing/serialising each format would be good. Also, perhaps worth checking whether eg rdf_parse(format="turtle")
is working.