What is the expected time to completion of robot diff? #47

cmungall · 2015-09-28T20:50:40Z

I'm trying to diff two successive versions of HPO, it's been running for 10 mins.

jamesaoverton · 2015-09-28T21:09:31Z

I'm trying a trivial diff of HPO now, and it's taking a long time and lot of memory.

The diff technique is the simplest thing that works: make two sets of axiom strings and compare them. I haven't profiled it for a range of ontologies. For OBI on my machine, it takes about a minute and 400MB of memory. It is obviously struggling for HPO, and it isn't huge.

Suggestions for a better approach are very welcome.

jamesaoverton · 2015-09-28T21:12:42Z

One option is for ROBOT to simply write axiom strings to files, and let Unix tools sort and diff them.

cmungall · 2015-09-28T22:30:51Z

That would be functional syntax then? With owlapi 3.5.3 the ordering
should be deterministic.

On 28 Sep 2015, at 14:12, James A. Overton wrote:

One option is for ROBOT to simply write axiom strings to files, and
let Unix tools sort and diff them.

Reply to this email directly or view it on GitHub:
#47 (comment)

jamesaoverton · 2015-09-28T22:51:48Z

The current code uses the Java toString() representations of each axiom, as two HashSets.

The HPO diff took 105 minutes for me, peaking at 1.4GB of memory.

cmungall · 2015-09-28T22:51:53Z

We're talking on the order of 100k axioms. A set intersection between two sets of that size should not take so long. Some java hashcode weirdness going on? @hdietze may be able to advise on the code

cmungall · 2015-09-29T18:28:22Z

$ time owljs-diff -C catalog-v001.xml hp-previous.owl hp-edit.owl > diff.md
Performing diff for type: Class
Performing diff for type: ObjectProperty

real    0m19.798s
user    0m30.842s
sys     0m0.903s

this code:

https://github.com/cmungall/owljs/blob/master/lib/Differ.js

problem is I want to abandon this, picked a bit of a losing horse with ringo. groovy/jython/clojure/straight java all better choices

jamesaoverton · 2015-10-06T17:42:28Z

The problem is loading and storing the full set of term labels for prettier diff output. Without labels I can diff HP in 1 minute.

I'll make the pretty printing optional, defaulting to off.

dosumis · 2015-10-07T09:44:39Z

Not sure how useful the diff will be without labels.

On Tue, Oct 6, 2015 at 6:42 PM, James A. Overton notifications@github.com
wrote:

The problem is loading and storing the full set of term labels for
prettier diff output. Without labels I can diff HP in 1 minute.

I'll make the pretty printing optional, defaulting to off.

—
Reply to this email directly or view it on GitHub
#47 (comment).

cmungall · 2015-10-07T14:15:34Z

Totally not following here. I take it as a given that the diff uses labels in the output. For example, if an axiom SubClassOf(A B) disappears then the output should show the labels for A and B.

But I'm not sure why storing labels would affect the time complexity in any way

jamesaoverton · 2015-10-07T14:45:37Z

This is the code, basically what I wrote for OBI 3 years ago: https://github.com/ontodev/robot/blob/master/robot-core/src/main/java/org/obolibrary/robot/DiffOperation.java#L69

The diff works by comparing sets of axioms as strings. The axiom strings contain IRIs, making them hard for humans to read. About the simplest axiom looks like this:

Declaration(Class(<http://purl.obolibrary.org/obo/OBI_0000070>))

So the addLabels method tries to do a regex match on IRIs and insert rdfs:labels into the axiom strings before printing them. To do this, all the labels are loaded into a map, up front.

None of this is very clever, and the labelling isn't even working properly for some of the examples I just checked. Doing it this way, the HP diff took me 105 minutes. Overriding with an empty map of labels, the HP diff took me one minute. In this case, I think loading all of HP's IRIs and labels into a map is too big and too slow, but I haven't tracked down the causes.

It needs to be fixed, somehow. I would like to see axioms like this:

Declaration(Class(<OBI:0000070>[assay]))

It needs to be reasonably efficient for largish ontologies such as HP. I'm not sure what the right approach is. I don't really want to implement a visitor for all the 57 OWLAPI axiom types. Chris' code groups stuff together into a proper report, although I've been happy enough with a classic diff format.

beckyjackson · 2019-03-14T16:29:10Z

Diff with labels is still slow for large ontologies. There is discussion of revising diff in #432 which may result in more efficient diffing. I'm going to close this issue in favor of that one, but please re-open if the completion time needs more discussion.

cmungall added the question label Sep 28, 2015

cmungall added a commit to obophenotype/human-phenotype-ontology that referenced this issue Sep 28, 2015

adding a target that uses robot to make a diff. However, takes a long…

519319d

… time, see ontodev/robot#47

jamesaoverton mentioned this issue Aug 21, 2018

Review signatures, add exceptions as required #349

Closed

jamesaoverton mentioned this issue Sep 18, 2018

Add diff option to force output of axioms with labels #361

Closed

beckyjackson closed this as completed Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the expected time to completion of robot diff? #47

What is the expected time to completion of robot diff? #47

cmungall commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

cmungall commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

cmungall commented Sep 28, 2015

cmungall commented Sep 29, 2015

jamesaoverton commented Oct 6, 2015

dosumis commented Oct 7, 2015

cmungall commented Oct 7, 2015

jamesaoverton commented Oct 7, 2015

beckyjackson commented Mar 14, 2019

What is the expected time to completion of robot diff? #47

What is the expected time to completion of robot diff? #47

Comments

cmungall commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

cmungall commented Sep 28, 2015

jamesaoverton commented Sep 28, 2015

cmungall commented Sep 28, 2015

cmungall commented Sep 29, 2015

jamesaoverton commented Oct 6, 2015

dosumis commented Oct 7, 2015

cmungall commented Oct 7, 2015

jamesaoverton commented Oct 7, 2015

beckyjackson commented Mar 14, 2019