Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to include nodes marked as obsolete #15

Merged
merged 6 commits into from
Sep 27, 2020
Merged
7 changes: 5 additions & 2 deletions obonet/read.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from .io import open_read_file


def read_obo(path_or_file):
def read_obo(path_or_file, ignore_obsolete=True):
"""
Return a networkx.MultiDiGraph of the ontology serialized by the
specified path or file.
Expand All @@ -19,6 +19,9 @@ def read_obo(path_or_file):
path_or_file : str or file
Path, URL, or open file object. If path or URL, compression is
inferred from the file extension.
ignore_obsolete : boolean
When true (default), terms that are marked 'is_obsolete' will
not be added to the graph.
"""
obo_file = open_read_file(path_or_file)
typedefs, terms, instances, header = get_sections(obo_file)
Expand All @@ -32,7 +35,7 @@ def read_obo(path_or_file):
edge_tuples = list()

for term in terms:
is_obsolete = term.get('is_obsolete', 'false') == 'true'
is_obsolete = ignore_obsolete and term.get('is_obsolete', 'false') == 'true'
if is_obsolete:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps keep is_obsolete = term.get('is_obsolete', 'false') == 'true' but change the next line to:

if ignore_obsolete and is_obsolete:

continue
term_id = term.pop('id')
Expand Down
17 changes: 15 additions & 2 deletions tests/test_obo_reading.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ def test_read_taxrank_file():
with open(path, 'rt') as read_file:
taxrank = obonet.read_obo(read_file)
assert len(taxrank) == 61
assert taxrank.node['TAXRANK:0000001']['name'] == 'phylum'
assert 'NCBITaxon:kingdom' in taxrank.node['TAXRANK:0000017']['xref']
# It looks like networkx has changed the name of the node variable to nodes -- EST 2020-09-25
assert taxrank.nodes['TAXRANK:0000001']['name'] == 'phylum'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like neworkx 2.4 removed G.nodes, which had been marked as deprecated in 2.1.

not sure why Travis CI builds are not showing up on GitHub for this pull request, but see the failures at https://travis-ci.org/github/dhimmel/obonet/builds/730420444. We support both networkx 1.x and 2.x.

You could add pytest.importorskip("networkx", minversion="2.0") to this test to skip it on 1.x builds. Removing 1.x support might be something we do soon anyways.

Copy link
Contributor Author

@torstees torstees Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original (taxrank.node) fails with networkx 2.5:

>       assert taxrank.node['TAXRANK:0000001']['name'] == 'phylum'
E       AttributeError: 'MultiDiGraph' object has no attribute 'node'

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original (taxrank.node) fails with networkx 2.5:

yes. we should change to using .node, although this will cause failure for the networkx 1.x builds. Therefore, I was thinking we could just skip the taxrank tests if networkx < 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taxrank.node fails for me when using networkx 2.5, but I left the failing test as it was originally.

assert 'NCBITaxon:kingdom' in taxrank.nodes['TAXRANK:0000017']['xref']


@pytest.mark.parametrize('extension', ['', '.gz', '.bz2', '.xz'])
Expand Down Expand Up @@ -118,3 +119,15 @@ def test_parse_tag_line_backslashed_exclamation():
tag, value, trailing_modifier, comment = parse_tag_line(line)
assert tag == 'synonym'
assert value == r'not a real example \!'

def test_ignore_obsolete_nodes():
hpo = obonet.read_obo("http://purl.obolibrary.org/obo/hp.obo")
nodes = hpo.nodes(data=True)
assert "HP:0005549" not in nodes

def test_presence_of_obsolete_nodes():
hpo = obonet.read_obo("http://purl.obolibrary.org/obo/hp.obo", ignore_obsolete=False)
nodes = hpo.nodes(data=True)
assert "HP:0005549" in nodes
node = nodes['HP:0005549']
assert node['is_obsolete'] == 'true'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests.

hp.obo looks to be 6.7M, so for the purpose of keeping these tests speedy, we might want to test on a simple ontology that we store locally. See examples in tests/data. This also protects us from future test failure if something changes at that URL.

One easy way might be to add the following to tests/data/brenda-subset.obo (source):

[Term]
id: BTO:0000311
name: culture filtrate
comment: Added as synonym to culture fluid.
is_obsolete: true