image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950

jhpoelen · 2021-09-22T14:18:40Z

When programmatically accessing image at http://arctos.database.museum/media/10562312?open , the image cannot be accessed. However, when accessing the image via a browser, the image can be retrieved.

To Reproduce
Steps to reproduce the behavior:

run curl -L "http://arctos.database.museum/media/10562312?open" > image.jpg to save the image into the file "image.jpg"
open image.jpg in an image viewer
inspect image

Expected behavior
The referenced image is available via image.jpg .

Actual behavior
The referenced image is not available in image.jpg. But, is available in a browser (see attached).

Also, for details on error message, see bio-guoda/preston#132 .

$ curl -L -I "http://arctos.database.museum/media/10562312?open"
HTTP/1.1 450 
Set-Cookie: cfid=07a11439-41f1-4b3b-94c2-a9d265790e57;Path=/;Expires=Tue, 12-Oct-2021 15:47:31 UTC;HttpOnly
Set-Cookie: cftoken=0;Path=/;Expires=Tue, 12-Oct-2021 15:47:31 UTC;HttpOnly
Set-Cookie: cfid=94c04dfc-4be9-4664-b3df-58c8e72cf7c8;Path=/;Expires=Tue, 12-Oct-2021 15:47:31 UTC;HttpOnly
Set-Cookie: cftoken=0;Path=/;Expires=Tue, 12-Oct-2021 15:47:31 UTC;HttpOnly
Set-Cookie: JSESSIONID=8F32EB00537A1924C4F5806180E4C98A; Path=/; HttpOnly
Content-Type: text/html;charset=UTF-8
Content-Length: 616
Date: Wed, 22 Sep 2021 14:09:28 GMT
Server: lighttpd/1.4.55

Screenshots

Data
If this involves external data, attach the actual data that caused the problem. Do not attach a transformation or subset. You may ZIP most formats to attach, or request a Box email address for very large files.

Desktop (please complete the following information):

OS: Ubuntu 18.04 LTS
Browser: firefox
Version: 90.0.2

The text was updated successfully, but these errors were encountered:

dustymc · 2021-09-22T14:29:55Z

450 Blocked by Windows Parental Controls (Microsoft)

Wat?

Curl works for me, the error suggests to me your network (or some device on it) but ????


Dustys-MBP:~ dlm$ curl -L -I "http://arctos.database.museum/media/10562312?open"
HTTP/1.1 302 Found
Set-Cookie: cfid=fcba232b-d878-4634-8449-546a87c9d9bf;Path=/;Expires=Tue, 12-Oct-2021 16:03:54 UTC;HttpOnly
Set-Cookie: cftoken=0;Path=/;Expires=Tue, 12-Oct-2021 16:03:54 UTC;HttpOnly
Set-Cookie: JSESSIONID=D65196FF8732058E64C02716701EA618; Path=/; HttpOnly
Location: https://web.corral.tacc.utexas.edu/CAS/20161217-02/jpg/chas_mamm_3302.7.jpg
Content-Type: text/html;charset=UTF-8
Content-Length: 6398
Date: Wed, 22 Sep 2021 14:25:50 GMT
Server: lighttpd/1.4.55

HTTP/1.1 200 OK
Content-Type: image/jpeg
Accept-Ranges: bytes
ETag: "4236783506"
Last-Modified: Tue, 10 Jan 2017 22:55:32 GMT
Strict-Transport-Security: max-age=15768000;
Content-Length: 25517
Date: Wed, 22 Sep 2021 14:25:51 GMT
Server: lighttpd/1.4.55

jhpoelen · 2021-09-22T14:43:13Z

Interesting. I saw the error on a German-based server.

When I run the same command on my US internet connection:

$ curl -L -I "http://arctos.database.museum/media/10562312?open"
HTTP/1.1 302 Found
Set-Cookie: cfid=82d23655-6328-4d45-ae7d-67108d319c8c;Path=/;Expires=Tue, 12-Oct-2021 16:19:19 UTC;HttpOnly
Set-Cookie: cftoken=0;Path=/;Expires=Tue, 12-Oct-2021 16:19:19 UTC;HttpOnly
Set-Cookie: JSESSIONID=192E43CDAA352DC698D37F3219620E8C; Path=/; HttpOnly
Location: https://web.corral.tacc.utexas.edu/CAS/20161217-02/jpg/chas_mamm_3302.7.jpg
Content-Type: text/html;charset=UTF-8
Content-Length: 6398
Date: Wed, 22 Sep 2021 14:41:15 GMT
Server: lighttpd/1.4.55

HTTP/1.1 200 OK
Content-Type: image/jpeg
Accept-Ranges: bytes
ETag: "4236783506"
Last-Modified: Tue, 10 Jan 2017 22:55:32 GMT
Strict-Transport-Security: max-age=15768000;
Content-Length: 25517
Date: Wed, 22 Sep 2021 14:41:17 GMT
Server: lighttpd/1.4.55

Are the images geo fenced? Or did this German server somehow end up on some blacklist?

dustymc · 2021-09-22T14:50:29Z

geo fenced

No, nor is anything else.

somehow end up on some blacklist

Very likely, especially if it's sharing IP space with some AWS-like farm. I'm happy to see what I can do if you want to provide an IP, but in general I don't have the resources to properly manage that kind of traffic so just run an aggressive blocker.

jhpoelen · 2021-09-22T14:52:57Z

Ok. If I provide some ip addresses, can you put those on a whitelist ?

jhpoelen · 2021-09-22T19:36:08Z

I'm happy to see what I can do if you want to provide an IP

I didn't see your offer before. Yes, I will provide a list of (two) IPs to you via other channels.

dustymc · 2021-09-22T20:28:05Z

Thanks, I opened both of those.

As above, I don't have the resources to really manage this sort of thing, and there's a fair bit of not-so-great traffic from both of those so no promises that they won't get locked back down. I will do whatever I can safely do if there are more problems, and we can elevate through the Arctos administrative channels if that doesn't prove satisfactory. I assume you're doing something cool and I think we'd all like to support it, but - at the risk of sounding like a broken record - resources.....

Do please note https://arctos.database.museum/robots.txt - we're asking for a 10-second crawl delay, and there is some 'you look like an SEO bot that we have neither the resources nor desire to feed' logic around that.

Also please note that Media are licensed (see eg http://arctos.database.museum/media/10562312) - I'm not sure where else to go with that, but the idea that we make it possible to get the media without the metadata comes up from time to time so there it is....

jhpoelen · 2021-09-22T22:11:23Z

Hey @dustymc -

Thanks for manually editing your whitelist.

I can confirm that for provided addresses now have access to the previously blocked content.

The following successful access curls can be seen:

$ curl -L -I "http://arctos.database.museum/media/10562312?open"
HTTP/1.1 302 Found
Set-Cookie: cfid=5ff0d7f1-f249-4c16-96f1-c4ceb541f4bb;Path=/;Expires=Tue, 12-Oct-2021 23:45:31 UTC;HttpOnly
Set-Cookie: cftoken=0;Path=/;Expires=Tue, 12-Oct-2021 23:45:31 UTC;HttpOnly
Set-Cookie: JSESSIONID=87FDAD1A9CCE281FEE023B2E720056E0; Path=/; HttpOnly
Location: https://web.corral.tacc.utexas.edu/CAS/20161217-02/jpg/chas_mamm_3302.7.jpg
Content-Type: text/html;charset=UTF-8
Content-Length: 6398
Date: Wed, 22 Sep 2021 22:07:27 GMT
Server: lighttpd/1.4.55

HTTP/1.1 200 OK
Content-Type: image/jpeg
Accept-Ranges: bytes
ETag: "4236783506"
Last-Modified: Tue, 10 Jan 2017 22:55:32 GMT
Strict-Transport-Security: max-age=15768000;
Content-Length: 25517
Date: Wed, 22 Sep 2021 22:07:30 GMT
Server: lighttpd/1.4.55

jhpoelen · 2021-09-22T22:25:08Z

And, I can see your point about resources. And this is something that I've been bringing up, and hoping to discuss more, in various meetings: the (hidden) cost of keeping and transferring "heavy" content like images, especially when they are stored centrally.

Also, the tools I am building packages the meta-data, including licensing, along with the images. Also, the integrity of the resulting image corpus can be verified so that you can trace exactly what was retrieved at what time, and what came before and after. Also, because the content is identified with hashes, the image corpus can be moved to other web locations (or offline storage) without having to compromise it's integrity or it's ability to verify that integrity. This also means that the copyright is associated with the exact image that was linked at some point in time. So, in theory, you can use this same technique (or other content finger print techniques like spectral analysis / image statistics) to determine whether someone is using, or keeping a copy of, an image from your collection. In theory this would allow to store the content in decentralized manner (perhaps internet archive or library of congress hosting the "heavy" images), and systems like arctos provide the meaningful connections between the digital artifacts (copyright relations, associations with specimen records etc.).

Anyways, tons to talk and think about, and I'd be open to having a live conversation about this.

The example image of the bat jaw that triggered this issue come up as I was showing a proof-of-concept to Kendra Phelps, a member of the EcoHealth Alliance, and collaborator in a biodiversity data hub, at https://jhpoelen.nl/bats . In this example, only media associated with 100 specimen are shown, only some of which come from Arctos.

Also, I'll make a note of figuring out a way to look for robot.txt and adjust request behavior based on it.

jhpoelen · 2021-09-29T19:31:11Z

@dustymc thanks for helping to make your images accessible. Closing this issue for now, and I am hoping to pickup the discussion around distributing content (and costs) so that images can be preserved across willing institutions and projects without impacting the integrity of the image.

jhpoelen added the Bug Arctos is not performing as it should. label Sep 22, 2021

dustymc added this to the Need More Information milestone Sep 22, 2021

jhpoelen closed this as completed Sep 29, 2021

jhpoelen mentioned this issue Sep 29, 2021

image missing from image corpus http://arctos.database.museum/media/10562312?open bio-guoda/preston#132

Closed

dustymc mentioned this issue Nov 4, 2021

Media - pre-check handling #4071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950

image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950

jhpoelen commented Sep 22, 2021 •

edited

Loading

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 29, 2021

image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950

image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950

Comments

jhpoelen commented Sep 22, 2021 • edited Loading

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

dustymc commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 22, 2021

jhpoelen commented Sep 29, 2021

jhpoelen commented Sep 22, 2021 •

edited

Loading