Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best practice for typing a fossil specimen? #3

Open
dennereed opened this issue Dec 11, 2016 · 5 comments
Open

What is the best practice for typing a fossil specimen? #3

dennereed opened this issue Dec 11, 2016 · 5 comments

Comments

@dennereed
Copy link
Contributor

dennereed commented Dec 11, 2016

For the fist use case, a fossil mandible fragment, how do we use DwC to type the specimen, i.e. what are the appropriate values for "dcterms:type" and "basisOfRecord". The solution John presented at TDWG 2016 was dcterms:type = PhysicalObject (from the DCMI type vocabulary) and basisOf Record = FossilSpecimen. We need to draft a short paragraph explaining the rationale for these values. Also, documenting whether the values should be string literals such as "PhysicalObject" or URI's such as "http://purl.org/dc/dcmitype/PhysicalObject"

@debpaul
Copy link
Collaborator

debpaul commented Feb 16, 2017

Hey @dennereed if you haven't already done this, and still seek feedback, please post your question about

documenting whether the values should be string literals such as "PhysicalObject" or URI's such as "http://purl.org/dc/dcmitype/PhysicalObject"

to the dwc hour input form https://tinyurl.com/zja2muz

@dennereed
Copy link
Contributor Author

Deb. Sorry for the delay. I just posted this issue to the tdwg-qa issue tracker (#58). Hope to get a response from John W. or Steve B.

@baskaufs
Copy link

baskaufs commented Mar 8, 2017

This is a good question! I can give you an answer with respect to RDF, but I think the non-RDF answer is going to depend on whatever convention is established by the community, and John Wieczorek would be able to provide a better answer than I.

In RDF, the recommendation for typing things is to always use the well-known term rdf:type (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) with a URI value. This is a fundamental property RDF for describing what kind of thing something is. There is no prohibition against providing multiple values for the term, so you could say

ex:thing rdf:type dwc:FossilSpecimen;
         rdf:type dctype:PhysicalObject.

In RDF, Dublin Core recommends against using dcterms:type for the very reason that rdf:type is more well-known.[1]

The Darwin Core RDF guide considers use of dwc:basisOfRecord optional (Section 2.3.1.4). If used in RDF, it should have a literal (string) value.

In my view, the basic problem that we have is that spreadsheets and tables are by their nature "flat". Although we like to think of a row in a table (a "record") represents one kind of thing, a row often contains metadata about several kinds of things, e.g. a specimen, a collector, a taxon, etc. It then becomes difficult to use a single column to describe everything covered in the row. In RDF, we get around this by breaking up the metadata into chunks and provide an rdf:type value for each chunk. That provides more clarity, but comes at the cost of increased complexity. When any individual user is presented with this dilemma, their response is usually "RDF is more complicated than what we need at the moment." and they move on. Hence, the lack of traction for RDF.

This is a longer answer than what you wanted, I'm sure. To come back to your specific question, I think the answer depends on how the data you are marking up is going to be used, and by whom. In RDF, we assume that we don't know who will be using data nor do we know what the use will be. You really can't take that approach with tables and spreadsheets - there needs to be some pre-existing understanding between the provider and the consumer about what ambiguous columns in a row "mean". I think that there is a general consensus in our community that dwc:basisOfRecord "means" the form of evidence that documents an occurrence record (a usual type of a row in a table sent to GBIF), and that it should have a string literal value. I'm not sure that there is a consensus about dcterms:type because it's not clear to me what people use that information for. Technically, dcterms:type (http://purl.org/dc/terms/type) should have a URI value and dc:type (http://purl.org/dc/elements/1.1/type) can have a literal value. But historically,TDWG has not really paid any attention to this distinction. I suspect you would find that people use dcterms:type very inconsistently. I would try to find out (from John W.) about how dcterms:type is most commonly used and do the same. Otherwise, for dcterms:type just recommend either string or URI values and try to get your community to be consistent about it.

Steve

[1] http://dublincore.org/documents/dc-rdf/#sect-5
[2] http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#2.3_Predicates

@debpaul
Copy link
Collaborator

debpaul commented Mar 21, 2017

also @dennereed see http://wiki.dublincore.org/index.php/FAQ/DC_and_DCTERMS_Namespaces for some background on where the current situation comes from. Standards evolve :-)

@dennereed
Copy link
Contributor Author

dennereed commented Nov 6, 2017

Created two wiki pages to help address this issue, one document the Darwin Core paleo use of basisOfRecrod and another FAQ page on the topic to typing fossil specimens. The former is more targeted and focuses on use of Darwin Core paleo terms whereas the latter is more general and address general issues of typing and which terms are appropriate and their implementation in RDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants