Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology) #27

thomcsmits · 2023-02-27T20:21:36Z

Similar to #25 , I am having a UnicodeDecodeError. The solution there is to upgrade to v0.3.0, but as far as I can tell, the setup.py still doesn't specify an encoding?

I have my OBO file locally (I downloaded the cl.obo file from https://obofoundry.org/ontology/cl.html)

import networkx
import obonet

path = "./data/ontology/cl.obo"
graph = obonet.read_obo(path)

obonet==0.3.1
networkx==3.0

Without running a local version of obonet with the encoding specified, how can I best resolve this error?

Any chance to add support for specifying an encoding in read_obo?

dhimmel · 2023-02-27T20:40:33Z

#25 is about an encoding issue while installing the obonet package and not when calling obonet.read_obo. So this sounds like a different problem?

Can you provide the error message that occurs from obonet.read_obo(path)?

dhimmel · 2023-02-27T20:46:03Z

Does the following work by the way:

# unversioned
obonet.read_obo("http://purl.obolibrary.org/obo/cl/cl-basic.obo")
# versioned
obonet.read_obo("https://github.com/obophenotype/cell-ontology/releases/download/v2023-02-19/cl-basic.obo")

thomcsmits · 2023-02-27T21:03:17Z

Thanks for the fast answer!! This is the error message:

Traceback (most recent call last):
  File "<dir>\src\ontology_obonet.py", line 23, in <module>
    graph = obonet.read_obo(path)
  File "<dir>\.venv\lib\site-packages\obonet\read.py", line 30, in read_obo
    typedefs, terms, instances, header = get_sections(obo_file)
  File "<dir>\.venv\lib\site-packages\obonet\read.py", line 77, in get_sections
    stanza_lines = list(stanza_lines)
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 8161: character maps to <undefined>

Reading directly from the URL works, though downloading the same version from https://github.com/obophenotype/cell-ontology/releases/download/v2023-02-19/cl-basic.obo and referencing it locally gives the error as above

Thanks for the help, I will just use the URL!

dhimmel · 2023-02-27T21:28:26Z

Okay I think the issue is that you're on windows, which is using a different default encoding to open files besides utf-8. According to PEP 686, Python 3.15 will start using utf-8 for opening files on Windows by default. You can also change the default for Python by setting and exporting the environment variable PYTHONUTF8=1.

But this is a workaround. The best solution would be for obonet.read_obo to accept a character set encoding that would get passed to the opener. This way you could specify utf-8 for cl-basic.obo and alternatively a different encoding used by a different ontology.

refs #27

dhimmel · 2023-02-28T14:35:28Z

Okay e6ff647 was able to create an encoding error on the Windows CI job!

E       UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 347: character maps to <undefined>

dhimmel added a commit that referenced this issue Feb 28, 2023

test_read_brenda_subset: add unicode characters

e6ff647

refs #27

dhimmel closed this as completed in 470e2ef Feb 28, 2023

dhimmel mentioned this issue Feb 28, 2023

Add encoding parameter in read_obo function #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology) #27

Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology) #27

thomcsmits commented Feb 27, 2023

dhimmel commented Feb 27, 2023

dhimmel commented Feb 27, 2023 •

edited

Loading

thomcsmits commented Feb 27, 2023 •

edited by dhimmel

Loading

dhimmel commented Feb 27, 2023

dhimmel commented Feb 28, 2023

Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology) #27

Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology) #27

Comments

thomcsmits commented Feb 27, 2023

dhimmel commented Feb 27, 2023

dhimmel commented Feb 27, 2023 • edited Loading

thomcsmits commented Feb 27, 2023 • edited by dhimmel Loading

dhimmel commented Feb 27, 2023

dhimmel commented Feb 28, 2023

dhimmel commented Feb 27, 2023 •

edited

Loading

thomcsmits commented Feb 27, 2023 •

edited by dhimmel

Loading