Skip to content

Commit

Permalink
prepare: handle fully-qualified resource URLs
Browse files Browse the repository at this point in the history
Prior to PRONOM 89, URLs were missing the scheme; there is now a mixture
of fully-qualified URLs and URLs without schemes. Treating them naively
caused the fetching in prepare to fail.
  • Loading branch information
mistydemeo authored and jhsimpson committed Jun 16, 2017
1 parent 0611a97 commit e845553
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion fido/prepare.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

from six.moves import cStringIO
from six.moves.urllib.request import urlopen
from six.moves.urllib.parse import urlparse

from .pronomutils import get_local_pronom_versions

Expand Down Expand Up @@ -272,7 +273,11 @@ def parse_pronom_xml(self, source, puid_filter=None):
for id in x.findall(TNA('ReferenceFileIdentifier')):
type = get_text_tna(id, 'IdentifierType')
if type == 'URL':
url = "http://" + get_text_tna(id, 'Identifier')
# Starting with PRONOM 89, some URLs contain http://
# and others do not.
url = get_text_tna(id, 'Identifier')
if not urlparse(url).scheme:
url = "http://" + url
ET.SubElement(rf, 'dc:identifier').text = url
# And calculate the checksum of this resource:
m = hashlib.md5()
Expand Down

0 comments on commit e845553

Please sign in to comment.