-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image download for http://arctos.database.museum/media/10562312?open fails with http return code 450 (Blocked by Windows Parental Controls) #3950
Comments
Wat? Curl works for me, the error suggests to me your network (or some device on it) but ????
|
Interesting. I saw the error on a German-based server. When I run the same command on my US internet connection:
Are the images geo fenced? Or did this German server somehow end up on some blacklist? |
No, nor is anything else.
Very likely, especially if it's sharing IP space with some AWS-like farm. I'm happy to see what I can do if you want to provide an IP, but in general I don't have the resources to properly manage that kind of traffic so just run an aggressive blocker. |
Ok. If I provide some ip addresses, can you put those on a whitelist ? |
I didn't see your offer before. Yes, I will provide a list of (two) IPs to you via other channels. |
Thanks, I opened both of those. As above, I don't have the resources to really manage this sort of thing, and there's a fair bit of not-so-great traffic from both of those so no promises that they won't get locked back down. I will do whatever I can safely do if there are more problems, and we can elevate through the Arctos administrative channels if that doesn't prove satisfactory. I assume you're doing something cool and I think we'd all like to support it, but - at the risk of sounding like a broken record - resources..... Do please note https://arctos.database.museum/robots.txt - we're asking for a 10-second crawl delay, and there is some 'you look like an SEO bot that we have neither the resources nor desire to feed' logic around that. Also please note that Media are licensed (see eg http://arctos.database.museum/media/10562312) - I'm not sure where else to go with that, but the idea that we make it possible to get the media without the metadata comes up from time to time so there it is.... |
Hey @dustymc - Thanks for manually editing your whitelist. I can confirm that for provided addresses now have access to the previously blocked content. The following successful access curls can be seen:
|
And, I can see your point about resources. And this is something that I've been bringing up, and hoping to discuss more, in various meetings: the (hidden) cost of keeping and transferring "heavy" content like images, especially when they are stored centrally. Also, the tools I am building packages the meta-data, including licensing, along with the images. Also, the integrity of the resulting image corpus can be verified so that you can trace exactly what was retrieved at what time, and what came before and after. Also, because the content is identified with hashes, the image corpus can be moved to other web locations (or offline storage) without having to compromise it's integrity or it's ability to verify that integrity. This also means that the copyright is associated with the exact image that was linked at some point in time. So, in theory, you can use this same technique (or other content finger print techniques like spectral analysis / image statistics) to determine whether someone is using, or keeping a copy of, an image from your collection. In theory this would allow to store the content in decentralized manner (perhaps internet archive or library of congress hosting the "heavy" images), and systems like arctos provide the meaningful connections between the digital artifacts (copyright relations, associations with specimen records etc.). Anyways, tons to talk and think about, and I'd be open to having a live conversation about this. The example image of the bat jaw that triggered this issue come up as I was showing a proof-of-concept to Kendra Phelps, a member of the EcoHealth Alliance, and collaborator in a biodiversity data hub, at https://jhpoelen.nl/bats . In this example, only media associated with 100 specimen are shown, only some of which come from Arctos. Also, I'll make a note of figuring out a way to look for robot.txt and adjust request behavior based on it. |
@dustymc thanks for helping to make your images accessible. Closing this issue for now, and I am hoping to pickup the discussion around distributing content (and costs) so that images can be preserved across willing institutions and projects without impacting the integrity of the image. |
When programmatically accessing image at http://arctos.database.museum/media/10562312?open , the image cannot be accessed. However, when accessing the image via a browser, the image can be retrieved.
To Reproduce
Steps to reproduce the behavior:
curl -L "http://arctos.database.museum/media/10562312?open" > image.jpg
to save the image into the file "image.jpg"Expected behavior
The referenced image is available via image.jpg .
Actual behavior
The referenced image is not available in image.jpg. But, is available in a browser (see attached).
Also, for details on error message, see bio-guoda/preston#132 .
Screenshots
Data
If this involves external data, attach the actual data that caused the problem. Do not attach a transformation or subset. You may ZIP most formats to attach, or request a Box email address for very large files.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: