Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the file type of the thumbnail before passing it to Photon #2086

Merged
merged 8 commits into from
May 16, 2023

Conversation

krysal
Copy link
Member

@krysal krysal commented May 12, 2023

Fixes

Fixes #2067 by @krysal

Description

To prevent known errors from Photon due to unsupported file types, this PR adds the following check before making the call to Photon:

  1. Get the thumbnail extension from the URL. We can't assume it'd be the same of the main image (in the case of images/), so try to guess it first from the URL. I believe it is worth trying this route, since this is where most of the cases will fall.
  2. If can't guess the extension from the URL then make a HEAD request to get the content-type of the file, caching the result
  3. If the media type is not supported by Photon then raise an exception: 415 Unsupported Media Type.

Testing Instructions

If you want to manually check, try replacing the first row (after the header) of the sample_data/sample_image.csv file with the following:

0e3315c5-3328-4a99-80ab-567ac32f685f,2022-12-21 17:29:54.000000+00,2022-12-21 17:29:54.000000+00,provider_api,wikimedia,wikimedia,,https://www.flickr.com/photos/54633257@N04/51745822704,https://upload.wikimedia.org/wikipedia/commons/9/92/Flower_sewing_pattern_outline.svg,,433,1024,97150,by-sa,2.0,Jooja,https://commons.wikimedia.org/wiki/User:Jooja,Flower sewing pattern outline,"{""views"": ""5991"", ""pub_date"": ""1639443671"", ""date_taken"": ""2021-07-20 18:19:25"", ""description"": ""dressed"", ""license_url"": ""https://creativecommons.org/licenses/by-sa/3.0/"", ""raw_license_url"": null}",,f,2021-12-15 20:49:19.976193+00,f,svg,illustration

The run just api/recreate and visit http://localhost:50280/v1/images/0e3315c5-3328-4a99-80ab-567ac32f685f/thumb/

You should see the new response message 'Unsupported media type "svg" in request.'

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@krysal krysal added 🟧 priority: high Stalls work on the project or its dependents ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: api Related to the Django API labels May 12, 2023
@krysal krysal force-pushed the fix/unsupported_types_photon branch from 60d3e29 to 83b5ace Compare May 12, 2023 20:04
@krysal krysal force-pushed the fix/unsupported_types_photon branch from 5e326ad to 7cdfeea Compare May 15, 2023 20:21
@krysal krysal marked this pull request as ready for review May 15, 2023 20:30
@krysal krysal requested a review from a team as a code owner May 15, 2023 20:30
@krysal krysal requested review from dhruvkb and sarayourfriend May 15, 2023 20:30
@krysal krysal requested review from stacimc and removed request for sarayourfriend May 15, 2023 20:31
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code and tests look good, great call on adding caching at various levels! I have a few suggestions/questions but nothing blocking 🚀 🖼️

api/api/utils/photon.py Outdated Show resolved Hide resolved
api/api/utils/photon.py Outdated Show resolved Hide resolved
api/api/utils/photon.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tested well! I appreciated the pro-active work to reduce Sentry messages 🚀

api/api/utils/photon.py Outdated Show resolved Hide resolved
@AetherUnbound
Copy link
Collaborator

Oh, one other thing I was curious about! I know we do the dead link filtering with using async to parallelize requests:

async def _make_head_requests(urls: list[str]) -> list[tuple[str, int]]:

I think that's because they're all happening as part of one request, whereas queries for the thumbnail are each their own request, right? That to say, do we need to try and parallelize this or is it okay to make the HEAD request as part of the request since all thumbnail requests will be issued separately, at the same time?

@krysal krysal force-pushed the fix/unsupported_types_photon branch from 7cdfeea to e8ffe3a Compare May 16, 2023 15:34
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
@krysal krysal force-pushed the fix/unsupported_types_photon branch from e8ffe3a to 9d76006 Compare May 16, 2023 15:36
krysal and others added 2 commits May 16, 2023 11:40
Co-authored-by: Staci Mullins <63313398+stacimc@users.noreply.github.com>
@krysal krysal force-pushed the fix/unsupported_types_photon branch from 814c247 to 7994b8b Compare May 16, 2023 15:48
@krysal
Copy link
Member Author

krysal commented May 16, 2023

do we need to try and parallelize this or is it okay to make the HEAD request as part of the request since all thumbnail requests will be issued separately, at the same time?
@AetherUnbound

I'm not sure I fully understand the question 🤔 These are all individual request so I don't see how a group of thumbnails requests can be parallelized in this point. We could create another endpoint that receives a group of media IDs to return their thumbnails, and there they can be parallelized, but here it might be a simple request of the link returned from /<media>/<uuid>/ view.


Thank you all for the suggestions!

@krysal krysal merged commit 8bc6ab7 into main May 16, 2023
@krysal krysal deleted the fix/unsupported_types_photon branch May 16, 2023 16:00
@AetherUnbound
Copy link
Collaborator

Got it, just got myself mixed up and had to talk through it to understand - thanks @krysal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Do not send svg images to photon for thumbnails
3 participants