Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fetching example URLs when updating signature #84

Closed
Hwesta opened this issue Oct 14, 2016 · 1 comment
Closed

Fix fetching example URLs when updating signature #84

Hwesta opened this issue Oct 14, 2016 · 1 comment
Milestone

Comments

@Hwesta
Copy link
Contributor

Hwesta commented Oct 14, 2016

When updating signatures, if the format has a ReferenceFileIdentifier of type URL, we include a reference to it, including fetching it and calculating a checksum. However, ReferenceFileIdentifier is not consistent in its meaning or format.

Eg from PRONOM 88 where fmt/11 starts with a www, and the URL is actually a PNG

<ReferenceFileIdentifier>
  <Identifier>www.w3.org/Graphics/PNG/nurbcup2si.png</Identifier>
  <IdentifierType>URL</IdentifierType>
</ReferenceFileIdentifier>
...
<ReferenceFileIdentifier>
  <Identifier>www.w3.org/Graphics/PNG/666.png</Identifier>
  <IdentifierType>URL</IdentifierType>
</ReferenceFileIdentifier>

compared to fmt/569, which starts with http:// and is a HTML page linking to examples

<ReferenceFileIdentifier>
  <Identifier>http://www.matroska.org/downloads/test_w1.html</Identifier>
  <IdentifierType>URL</IdentifierType>
</ReferenceFileIdentifier>

When parsing it, we prepend http:// and fetch it, which breaks with http://www.matroska.org/downloads/test_w1.html

url = "http://" + get_text_tna(id, 'Identifier')
...
sock = urlopen(url)

Options include removing the examples and checksums from formats-v##.xml, or adding error handling around that section.

@jhsimpson jhsimpson added this to the 1.3.6 milestone Jun 29, 2017
@jhsimpson
Copy link
Contributor

This issue appears to be fixed by #101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants