-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add export command #481
Add export command #481
Conversation
See #459 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but I'd like some significant changes. Then I'll review it again.
robot-core/src/main/java/org/obolibrary/robot/ExportOperation.java
Outdated
Show resolved
Hide resolved
robot-core/src/main/java/org/obolibrary/robot/ExportOperation.java
Outdated
Show resolved
Hide resolved
I think this
|
|
|
Fixed, for example:
Updated to distinguish, for example:
Cardinality restrictions added, for example:
|
minor: spaces in OWL constructs look odd to me. I would |
For OPs, the default should always be E.g. `X SubClassOf 'part of' some Y'
I can't think of any use case for showing One exception may be for reversibility with templates. Maybe there could be a global option for this. |
A common idiom in TSVs is to stripe IDs and Labels. E.g.
would be good to see
Specifying this on a per OP basis could be tedious for the user. So I think we should have options that apply to any field that denotes an OWL object how about:
|
I am responsible for this shortform thing, I wouldn't encourage it. instead, how about allowing an OP to be specified by its rdfs:label? |
apologies for the scattergun comments, and for not noticing this has been open for a while. Overall this is really awesome and I'm looking forward to having this in. It's good to think hard about and get a range of opinions about things like class expressions in exported values. The communities I work with would want a big dumb denormalized table with no parsing required. Something that can be loaded directly into pandas or an r dataframe |
Pretty cool feature! In case this was not tested: "," or tabs in all fields should be escaped properly. I assume multiple labels are piped as well. I am not that interested in the class expression export being parseable, since I hope this feature is for documentation purposes only (and not for reverse injecting this with template back into ontology; very unsafe IMHO). Will be incorporating this ODK as soon as it is out! |
The "striping" use case is a good addition. I think it's good for the @cmungall Are you sure that you don't want the output to include expressions? What if including expressions was an option, turned off by default? I was thinking about matching Yes, we should have better tests for escaping delimiters, and especially for escaping quotes. There are so many dumb edges cases that using a proper CSV library is probably worth it. I'd still like to know if we plan to add JSON output to this command in the future, in which case we might want to change things now. Like maybe change |
Sorry for the delay in responding
Off by default is fine, but I am not really sure I see the use case for including these. I have checked out the branch and running:
which gives me:
So But if it doesn't add complexity and you think there is a use case, I don't object to a non-default option to emit a partial class expression, so long as the default is to emit the My liist of proposed changes prior to merge:
|
A minor annoyance:
The ERROR appears to be a false positive, because the file looks fine:
|
Gosh I'm just full of complaints amn't I... running this on hp.owl at the moment, seems very slow. Just writing this as note to self to do some profiling/optimization.
|
Would you prefer this looks like:
Or...
OK, I see your options in that comment. Is the default behavior to add these "label" columns?
Agreed that this should be allowed.
I'll take a look and see what's going on here. It's been awhile since I've looked at this code 😅 |
I strongly recommend the former, i.e
Yes, I think this makes sense as a default |
robot-core/src/main/java/org/obolibrary/robot/ExportOperation.java
Outdated
Show resolved
Hide resolved
@beckyjackson Please remove all those |
If I try
However, Either make it case insensitive or change the help message |
Test:
gives:
there should be no quoting in TSV (the definition field is triple-double quoted!!!) Also string literals should just emit the literal, not the xsd type or language. Again like with class expressions the general principle for tabular outputs is that values should be as atomic as possible. |
Also, still not emitting IDs:
|
Thanks @cmungall. @beckyjackson will work on these today. Keep them coming 😄 What IDs are you expecting in your previous comment? |
It also seems to be 'inferring' labels for unlabeled classes E.g for merged classes:
yields:
I'm guessing the URI is used as the label if not present (and a slightly different CURIE contracting algorithm...) Here the field value should be blank/empty |
Request/proposal: use |
@cmungall - if you're trying to get the IDs of terms, the And I agree that we should use ID instead of CURIE, since that's what the tag is. |
@cmungall We think we've addressed all your comments from this morning. Please try again. |
extract.md says:
In fact, exclude is true by default (which is my preferred default), so the docs should be changed UPDATE I see in fact it is false by default. This is not my preference but I can live with this. |
I see. What do we think of making the ID or IRI the default (e.g. |
Can we also remove tautologies by default? I don't think it's useful to see that root nodes and obsolete classes are subClasses of owl:Thing. (you could argue that it is not completely content free - e.g. someone may have manually classified an incoherent class under Nothing, or we could be running export post-reason, but even here, the inclusion of the assertion is so arbitrary depending on a sequence of owlapi operations, it renders it useless for any purpose) |
Can I just say again this command is AWESOME OK, I think we just have to make a decision on the following two things:
I can live with whichever decision is made either way but it's worth making a considered decision here A few other minor things that can be punted to a future release so long as they are not considered compatibility breaking, just adding so they do not get forgotten:
Docs (I can make these changes later):
|
Thanks for the detailed feedback @cmungall!
To Do before release:
|
I like the idea of renaming The only use case I see for |
Good question about missing LABELs. I think I'm ok with unlabelled things disappearing, as long as the documentation is clear. But I'm not completely sure... If we ask for LABEL and an unlabelled term is part of a class expression like "foo subclass of ID:123", then what would we see? |
We would see
|
All checklist items above have been addressed. I updated the documentation in the first comment to reflect all new behavior. |
My plan is to merge this now, then ask obo-tools for more feedback before releasing 1.7.0. @cmungall Does that sound good? |
I am testing master now, so far it all looks fantastic, thanks! |
Export
Contents
ROBOT can export details about ontology entities as a table. At minimum, the
export
command expects an input ontology (--input
), a set of column headers (--header
), and a file to write to (--export
):Formats
The following formats are currently supported:
tsv
csv
html
These can be specified with the
--format
option:If this option is not included,
export
will predict the format based on the file extension. If the extension does not match with an existing format, it will default totsv
.The
html
format will output an HTML table with Bootstrap styling. All entities referenced will be rendered as clickable links.Columns
The
--header
option is a pipe-separated list of special keywords or properties used in the ontology. The columns in the--header
argument will exactly match the first line of the export file (the column headers).Various
--header
types are supported:IRI
: creates an "IRI" column based on the full unique identifierID
: creates an "ID" column based on the short form of the unique identifier (CURIE)LABEL
: creates a "Label" column based onrdfs:label
SYNONYMS
: creates a "SYNONYMS" column based on all synonyms (oboInOwl exact, broad, narrow, related, or IAO alternative term)SubClass Of
: creates a "SubClass Of" column based onrdfs:subClassOf
SubClasses
: creates a "SubClasses" column based on direct children of a classEquivalent Class
: creates an "Equivalent Classes" column based onowl:equivalentClass
SubProperty Of
: creates a "SubProperty Of" column based onrdfs:subPropertyOf
Equivalent Property
: creates an "Equivalent Properties" column based onowl:equivalentProperty
Disjoint With
: creates a "Disjoint With" column based onowl:disjointWith
Type
: creates an "Instance Of" column based onrdf:type
for named individualsoboInOwl:hasDbXref
). Any prefix used must be defined.database_cross_reference
). This label will also be used as the column header.The first header in the
--header
list is used to sort the rows of the export. You can change the column that is sorted on by including--sort <header>
. This can either be one header, or a pipe-separated list of headers that will be sorted in-order:In the example above, the rows are first sorted on the
NAME
field, and then sorted bySubClass Of
. This means that entities with the same parent will be grouped in alphabetical order.If the
--sort
header starts with^
, the column will be sorted in reverse order.All special keyword columns will include both named OWL objects (named classes, properties, and individuals) and anonymous expressions (class expressions, property expressions). When using another object or data property, the values will include both individuals and class expressions (from subclass or equivalent statements) in Manchester syntax. When using an annotation property, the literal value will be returned.
By default, multiple values in a cell are separated with a pipe character (
|
). You can update this to anything you'd like with the--split
option. For example, you could separate with commas:The output of any cell with multiple values is sorted in alphabetical order.
Including and Excluding Entities
By default, the export includes details on the classes and individuals in an ontology. Properties are excluded. You can configure which types of entities you wish to include with the
--include <entity types>
option. The<entity types>
argument is a space-, comma-, or tab-separated list of one or more of the following entity types:classes
individuals
properties
For example, to return the details of individuals only:
To return details of classes and properties:
The
--include
option does not need to be specified if you are getting details on individuals and classes. If you do specify an--include
, it cannot be an empty string, as no entities will be included in the export.Finally, the export will include anonymous expressions (subclasses, equivalent classes, property expressions). If you only wish to include named entities, add
--exclude-anonymous true
:Note that in the example above, the first two headers are special keywords and the third is the label of a property used in the ontology.
Rendering Cell Values
Entities used in cell values are rendered by one of four different strategies:
NAME
- render the entity by label (if label does not exist, entity is rendered by CURIE)ID
- render the entity by short form ID/CURIEIRI
- render the entity by full IRILABEL
- render the entity by label ONLY (if label does not exist, entity is rendered as an empty string)By default, values are rendered with the
NAME
strategy. To update the strategy globally, you can use the--entity-format
option and provide one of the above values:In the above example, all the "subclass of" values will be rendered by their short form ID.
You can also specify different rendering strategies for different columns by including the strategy name in a square-bracket-enclosed tag after the column name:
These tags should not be used with the following default columns:
LABEL
,ID
, orIRI
as they will not change the rendered values.Preparing the Ontology
When exporting details on classes using object or data properties, we recommend running reason, relax, and reduce first. You can also create a subset of entities using remove or filter.