-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the NEO into the main pipeline #35
Comments
Well, starting and exploring this a little bit, it will not pan out in a "merged" index--we would clobber on general, which is used (for example) by the ubernoodle for NEO. |
switch to solr6 and use a separate core? |
The idea is to simplify are current setup, reducing the number of deployed servers and/or number of distinct pipelines. As the Solr 6.x (higher now) is orthogonal, splitting out separately would be at least a temporary bump up in the above. |
From an earlier experiment, the overlay is problematic. We'll work towards the weaker form to make progress on things like #73 and geneontology/neo#38 (comment) |
Until we have a fix for the NEO job automation, it will be a manual step. |
From @hdrabkin: I had created 6 new PRO ids and they became available in our MGI GO EI on Friday. That means they are in the mgi.gpi, (I verified) which I expected would then make them available in Noctua today but they are not there. |
Also see geneontology/neo#38 (comment) |
So does this mean these ids will be available soon? |
A manual load is finishing now and a spot check seems positive -- try them now? @cmungall I think there may be something up owltools and the NEO load. It seems to slow down towards the end of the ontology document loading (not for general docs), eventually giving out. I'll try and get a more nuanced view at some point, but it may be best to look towards this as a use case for a new python loader after the go-cams. |
Actually, I'm not sure we use anything but the "general" doc in the index... |
@cmungall We'll need to discuss 1) how we want to migrate the neo build to a new pipeline (whether main or not) and 2) what actual deployment looks like for the ontology |
This will need to be tested a bit more, but it looks like the additional resources and updates on our new pipeline can make short work of the NEO products build: |
From @cmungall : the PURLs are from the given S3 bucket, not Jenkins, so we just clobber them out. |
Need more mem for Java:
|
We could easily split neo into multiple separate files to be read. Seems
like current approach won't scale if we add swissprot.
…On Thu, Jan 24, 2019 at 6:05 PM kltm ***@***.***> wrote:
Need more mem for Java:
/obo/BFO_0000040> "BFO:0000040"^^xsd:string) AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/CHEBI_23367> "CHEBI:23367"^^xsd:string) }
18:02:38 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$EntryIterator.<init>(ObjectHashSet.java:734)
18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:784)
18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:779)
18:02:38 at com.carrotsearch.hppcrt.ObjectPool.<init>(ObjectPool.java:74)
18:02:38 at com.carrotsearch.hppcrt.IteratorPool.<init>(IteratorPool.java:51)
18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:778)
18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:157)
18:02:38 at uk.ac.manchester.cs.owl.owlapi.HPPCSet.<init>(MapPointer.java:444)
18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.putInternal(MapPointer.java:324)
18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.init(MapPointer.java:151)
18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.getValues(MapPointer.java:190)
18:02:38 at uk.ac.manchester.cs.owl.owlapi.OWLImmutableOntologyImpl.getAxioms(OWLImmutableOntologyImpl.java:1325)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADGOTBTHt9-NBd5dvmHC8Kovxg7ipq6ks5vGmZugaJpZM4TZDiF>
.
|
Hi Seth |
@hdrabkin I believe that this is a different issue. Your should be cleared on the completion of |
Previously discussed with @cmungall , we would spin out this branch into a new top-level pipeline. After starting work on that, I do not believe it's viable compared to formalizing it as a new branch in the current pipeline: it would either be a very fiddly piece of code that played carefully so as not to accidentally clobber skyhook locations or it would require a small rewrite of how skyhook works. While neither of these are insurmountable, given the small and likely temporary nature of this pipeline, I think formalizing the current branch into something slightly more permanent is the fastest and safest way forward. |
Discussed with @goodb on how to make this a workable transition:
With the completion of this, we can now either build the GOlr index for go-lego in the main pipeline, or do it elsewhere. Deployment would still be once a week or so, so it may be fine to keep the degenerate |
From the call today, talking with @goodb and @balhoff, next steps in For Noctua GOlr:
For Minerva:
Clarification:
|
Probably the best thing is to add this to the ontology makefile by making a go-lego-reacto-edit.ofn file, adding in reacto.owl as an import, and adding the target to the makefile just like the go-lego one. Note the issue of having code to make reacto in a different location from the ontology makefile - so synchronization may be an issue. |
@goodb Hm. I don't think there is necessarily any issue with that, as reacto.owl is made during a normal pipeline run anyways and this experimental pipeline will eventually be folded into that. I suppose there is a bit of a trick with the references here, but hopefully that could be accomplished with a catalog or a materialized ontology. That said, it would actually be a convenience to have reacto.owl in the GO Makefile as well, would it not? |
It would indeed be more convenient to have reacto built in the main makefile along with the others. We might actually promote that to a policy - that all ontology products are produced there. Downstream things like journals and indexes could happen elsewhere in the pipeline. |
The building of reacto.owl is just a few lines and it feels like it would be an easy win and easy to back out of if necessary. It seems to need a single remote file and a single binary available for the build, possibly supplied by optional environmental variables. As that binary is a release, it might be nice just to add that as a lib to the go-ontology repo to cut the external dependency and make it a little more self-contained. |
@kltm I don't see any reason not to go forward as you suggest. At some point I'd like to figure out why the source code build wasn't working in the pipeline environment and get it posted to maven. For now, I think the binary release approach we have now ought to work. For merge if needed, pretty straightforward robot command. http://robot.obolibrary.org/merge I can work on it this weekend if you want. |
@goodb Okay, it would be great if you could go ahead with this. If at all possible, please be mindful of the relative positions of the files in the directory hierarchy: /ontology/extensions/reacto.owl In the meantime, let me know if you'd like me to do the relatively straightforward merge to produce go-lego-reacto.owl to unblock you on testing minerva and co. |
…docker fail) and split derivatives so we can restart faster; work on #35
@goodb My current understanding is that, while this pipeline is still separate, it is now creating all of the products that we want. Besides merging this back into the main pipeline (which may have to wait until we get some speed improvements), we still have some work on checking NEO here: https://github.com/geneontology/pipeline/blob/issue-35-neo-sanity-test/Jenkinsfile#L352 . @dougli1sqrd @goodb Is this still in progress? |
Yes I think that's still ongoing? I need to revisit to see the exact state, it's been a little bit since I've looked at that last. |
@kltm my understanding matches yours with regard to pipeline build. Regarding testing the products, I had written a couple simple sparql queries that could be run on the generated, merged ontologies. I had handed this off to @dougli1sqrd to pipelinify. It looks like he is running something on the merged go-lego.owl file using Robot. That ought to work but if its slow, the tests could be moved downstream to make use of the blazegraph journals that are now being generated. Test queries could be run with blazegraph-runner and ought to be fast. BTW @dougli1sqrd running something with neo in its name ("sparql/neo/profile.txt") against go-lego.owl probably no longer makes sense. neo.owl is not currently included in go-lego.owl. It needs to be treated separately. Downstream there is a blazegraph journal that does merge them if you need them together. |
From the software call today, we don't want to forget making reacto creation and exposure better with this item. |
The general idea would be to eliminate as much mechanism as possible as far as deployment and maintenance of multiple pipelines and servers. To this end, I've proposed that NEO (the neo.owl owltools ontology load, sorry @cmungall) gets folded into the main solr load and index. This would simply be:
A separate issue, not dealt with here, would be the adding of the creation of neo.owl itself. As we are just pulling from a URL, this can be separated.
Another, weaker, formulation would be to drop the NEO index separately, but within the new pipeline framework and runs.
The text was updated successfully, but these errors were encountered: