Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the expected time to completion of robot diff? #47

Closed
cmungall opened this issue Sep 28, 2015 · 11 comments
Closed

What is the expected time to completion of robot diff? #47

cmungall opened this issue Sep 28, 2015 · 11 comments
Labels

Comments

@cmungall
Copy link
Contributor

I'm trying to diff two successive versions of HPO, it's been running for 10 mins.

cmungall added a commit to obophenotype/human-phenotype-ontology that referenced this issue Sep 28, 2015
@jamesaoverton
Copy link
Member

I'm trying a trivial diff of HPO now, and it's taking a long time and lot of memory.

The diff technique is the simplest thing that works: make two sets of axiom strings and compare them. I haven't profiled it for a range of ontologies. For OBI on my machine, it takes about a minute and 400MB of memory. It is obviously struggling for HPO, and it isn't huge.

Suggestions for a better approach are very welcome.

@jamesaoverton
Copy link
Member

One option is for ROBOT to simply write axiom strings to files, and let Unix tools sort and diff them.

@cmungall
Copy link
Contributor Author

That would be functional syntax then? With owlapi 3.5.3 the ordering
should be deterministic.

On 28 Sep 2015, at 14:12, James A. Overton wrote:

One option is for ROBOT to simply write axiom strings to files, and
let Unix tools sort and diff them.


Reply to this email directly or view it on GitHub:
#47 (comment)

@jamesaoverton
Copy link
Member

The current code uses the Java toString() representations of each axiom, as two HashSets.

The HPO diff took 105 minutes for me, peaking at 1.4GB of memory.

@cmungall
Copy link
Contributor Author

We're talking on the order of 100k axioms. A set intersection between two sets of that size should not take so long. Some java hashcode weirdness going on? @hdietze may be able to advise on the code

@cmungall
Copy link
Contributor Author

$ time owljs-diff -C catalog-v001.xml hp-previous.owl hp-edit.owl > diff.md
Performing diff for type: Class
Performing diff for type: ObjectProperty

real    0m19.798s
user    0m30.842s
sys     0m0.903s

this code:

https://github.com/cmungall/owljs/blob/master/lib/Differ.js

problem is I want to abandon this, picked a bit of a losing horse with ringo. groovy/jython/clojure/straight java all better choices

@jamesaoverton
Copy link
Member

The problem is loading and storing the full set of term labels for prettier diff output. Without labels I can diff HP in 1 minute.

I'll make the pretty printing optional, defaulting to off.

@dosumis
Copy link

dosumis commented Oct 7, 2015

Not sure how useful the diff will be without labels.

On Tue, Oct 6, 2015 at 6:42 PM, James A. Overton notifications@github.com
wrote:

The problem is loading and storing the full set of term labels for
prettier diff output. Without labels I can diff HP in 1 minute.

I'll make the pretty printing optional, defaulting to off.


Reply to this email directly or view it on GitHub
#47 (comment).

@cmungall
Copy link
Contributor Author

cmungall commented Oct 7, 2015

Totally not following here. I take it as a given that the diff uses labels in the output. For example, if an axiom SubClassOf(A B) disappears then the output should show the labels for A and B.

But I'm not sure why storing labels would affect the time complexity in any way

@jamesaoverton
Copy link
Member

This is the code, basically what I wrote for OBI 3 years ago: https://github.com/ontodev/robot/blob/master/robot-core/src/main/java/org/obolibrary/robot/DiffOperation.java#L69

The diff works by comparing sets of axioms as strings. The axiom strings contain IRIs, making them hard for humans to read. About the simplest axiom looks like this:

Declaration(Class(<http://purl.obolibrary.org/obo/OBI_0000070>))

So the addLabels method tries to do a regex match on IRIs and insert rdfs:labels into the axiom strings before printing them. To do this, all the labels are loaded into a map, up front.

None of this is very clever, and the labelling isn't even working properly for some of the examples I just checked. Doing it this way, the HP diff took me 105 minutes. Overriding with an empty map of labels, the HP diff took me one minute. In this case, I think loading all of HP's IRIs and labels into a map is too big and too slow, but I haven't tracked down the causes.

It needs to be fixed, somehow. I would like to see axioms like this:

Declaration(Class(<OBI:0000070>[assay]))

It needs to be reasonably efficient for largish ontologies such as HP. I'm not sure what the right approach is. I don't really want to implement a visitor for all the 57 OWLAPI axiom types. Chris' code groups stuff together into a proper report, although I've been happy enough with a classic diff format.

@beckyjackson
Copy link
Contributor

Diff with labels is still slow for large ontologies. There is discussion of revising diff in #432 which may result in more efficient diffing. I'm going to close this issue in favor of that one, but please re-open if the completion time needs more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants