-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the expected time to completion of robot diff? #47
Comments
I'm trying a trivial diff of HPO now, and it's taking a long time and lot of memory. The diff technique is the simplest thing that works: make two sets of axiom strings and compare them. I haven't profiled it for a range of ontologies. For OBI on my machine, it takes about a minute and 400MB of memory. It is obviously struggling for HPO, and it isn't huge. Suggestions for a better approach are very welcome. |
One option is for ROBOT to simply write axiom strings to files, and let Unix tools sort and diff them. |
That would be functional syntax then? With owlapi 3.5.3 the ordering On 28 Sep 2015, at 14:12, James A. Overton wrote:
|
The current code uses the Java toString() representations of each axiom, as two HashSets. The HPO diff took 105 minutes for me, peaking at 1.4GB of memory. |
We're talking on the order of 100k axioms. A set intersection between two sets of that size should not take so long. Some java hashcode weirdness going on? @hdietze may be able to advise on the code |
this code: https://github.com/cmungall/owljs/blob/master/lib/Differ.js problem is I want to abandon this, picked a bit of a losing horse with ringo. groovy/jython/clojure/straight java all better choices |
The problem is loading and storing the full set of term labels for prettier diff output. Without labels I can diff HP in 1 minute. I'll make the pretty printing optional, defaulting to off. |
Not sure how useful the diff will be without labels. On Tue, Oct 6, 2015 at 6:42 PM, James A. Overton notifications@github.com
|
Totally not following here. I take it as a given that the diff uses labels in the output. For example, if an axiom But I'm not sure why storing labels would affect the time complexity in any way |
This is the code, basically what I wrote for OBI 3 years ago: https://github.com/ontodev/robot/blob/master/robot-core/src/main/java/org/obolibrary/robot/DiffOperation.java#L69 The diff works by comparing sets of axioms as strings. The axiom strings contain IRIs, making them hard for humans to read. About the simplest axiom looks like this:
So the None of this is very clever, and the labelling isn't even working properly for some of the examples I just checked. Doing it this way, the HP diff took me 105 minutes. Overriding with an empty map of labels, the HP diff took me one minute. In this case, I think loading all of HP's IRIs and labels into a map is too big and too slow, but I haven't tracked down the causes. It needs to be fixed, somehow. I would like to see axioms like this:
It needs to be reasonably efficient for largish ontologies such as HP. I'm not sure what the right approach is. I don't really want to implement a visitor for all the 57 OWLAPI axiom types. Chris' code groups stuff together into a proper report, although I've been happy enough with a classic diff format. |
Diff with labels is still slow for large ontologies. There is discussion of revising |
I'm trying to diff two successive versions of HPO, it's been running for 10 mins.
The text was updated successfully, but these errors were encountered: