-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to screenshots or opengraph if possible in media resolution #452
Changes from 10 commits
8181a0e
37aa259
a5723da
ffab45c
569c8ea
96379dc
4e769e9
38870cd
823511f
f94d9e8
0c0d9aa
95a17a7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,7 +34,26 @@ defmodule Ret.MediaResolver do | |
def resolve(%MediaResolverQuery{url: url} = query) when is_binary(url) do | ||
uri = url |> URI.parse() | ||
root_host = get_root_host(uri.host) | ||
resolve(query |> Map.put(:url, uri), root_host) | ||
query = Map.put(query, :url, uri) | ||
|
||
# TODO: We could end up running fallback_to_screenshot_opengraph_or_nothing | ||
# twice in a row. These resolve functions can be simplified so that we can | ||
# more easily track individual failures and only fallback when necessary. | ||
# Also make sure they have a uniform response shape for indicating an | ||
# error. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a tech debt concern that I do not want to cover in this PR, (and I am not sure I'm right), but it might be worth changing in the future: I think what we might want to do is:
From https://hexdocs.pm/cachex/Cachex.html#fetch/4 This may not be necessary, but it seems to me like it might be clearer to explicitly return one of:
We should do our best differentiate between the return values of case Cachex.fetch(:media_urls, query) do
{:error, _} -> # This is a Cachex error!
{_status, {:error, _reason} } -> # This is a cached error we put in ourselves
end
See this commit for details ffab45c#diff-41751f7931f9a08bf162d96c927b4c6872c21b2d215916a5b0f8041848d08152R119 This might be easier to distinguish if we create a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds correct to me. Mixing the fallback stuff in with resolving is confusing. We should have resolving return an ok/error result like you suggest, and then use that in cachex fallback. Might be nice for the cachex part not even to live in this file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Captured this in #454 for addressing at a later date |
||
case resolve(query, root_host) do | ||
:error -> | ||
fallback_to_screenshot_opengraph_or_nothing(query) | ||
|
||
{:error, _reason} -> | ||
fallback_to_screenshot_opengraph_or_nothing(query) | ||
|
||
{:commit, nil} -> | ||
fallback_to_screenshot_opengraph_or_nothing(query) | ||
|
||
commit -> | ||
commit | ||
end | ||
end | ||
|
||
def resolve(%MediaResolverQuery{url: %URI{host: nil}}, _root_host) do | ||
|
@@ -332,15 +351,20 @@ defmodule Ret.MediaResolver do | |
end) | ||
end | ||
|
||
defp resolve_non_video(%MediaResolverQuery{url: %URI{host: host} = uri, version: version}, _root_host) do | ||
defp resolve_non_video(%MediaResolverQuery{} = query, _root_host) do | ||
fallback_to_screenshot_opengraph_or_nothing(query) | ||
end | ||
|
||
# TODO: Refactor this function | ||
defp fallback_to_screenshot_opengraph_or_nothing(%MediaResolverQuery{url: %URI{host: host} = uri, version: version}) do | ||
photomnemonic_endpoint = module_config(:photomnemonic_endpoint) | ||
|
||
# Crawl og tags for hubs rooms + scenes | ||
is_local_url = host === RetWeb.Endpoint.host() | ||
|
||
case uri |> URI.to_string() |> retry_head_then_get_until_success([{"Range", "bytes=0-32768"}]) do | ||
:error -> | ||
nil | ||
:error | ||
|
||
%HTTPoison.Response{headers: headers} -> | ||
content_type = headers |> content_type_from_headers | ||
|
@@ -376,7 +400,7 @@ defmodule Ret.MediaResolver do | |
|
||
case Download.from(url, path: path) do | ||
{:ok, _path} -> {:ok, %{content_type: "image/png"}} | ||
error -> {:error, error} | ||
_error -> :error | ||
end | ||
end | ||
) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -74,6 +74,14 @@ defmodule RetWeb.Api.V1.MediaController do | |
end | ||
|
||
defp resolve_and_render(conn, url, version, quality \\ nil) do | ||
query = query_for(conn, url, version, quality) | ||
value = Cachex.fetch(:media_urls, query) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could pipeline value through here (would need to change This definitely reads more clearly than what we had before. Also, we didn't have it before but might be worth adding a note that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RE: adding a note about I could be persuaded into adding a note (and I did not know this behavior before now either), but since it's used in several places in the client I am also OK with having people find out about this by consulting the docs. I don't feel strongly about this though so I'll add it if you think it's helpful ./lib/ret/slack_client.ex:78: case Cachex.fetch(
./lib/ret/slack_client.ex:93: case Cachex.fetch(
./lib/ret/slack_client.ex:105: case Cachex.fetch(
./lib/ret/app_config.ex:49: Cachex.fetch(:app_config, "")
./lib/ret/app_config.ex:117: case Cachex.fetch(:app_config_value, key) do
./lib/ret/app_config.ex:126: case Cachex.fetch(:app_config_owned_file_uri, key) do
./lib/ret/discord_client.ex:66: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/members/#{provider_account_id}") do
./lib/ret/discord_client.ex:71: case Cachex.fetch(:discord_api, "/users/#{provider_account_id}") do
./lib/ret/discord_client.ex:80: case Cachex.fetch(:discord_api, "/users/#{provider_account_id}") do
./lib/ret/discord_client.ex:138: case Cachex.fetch(:discord_api, "/guilds/#{community_id}") do
./lib/ret/discord_client.ex:146: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/roles") do
./lib/ret/discord_client.ex:172: case Cachex.fetch(:discord_api, "/channels/#{channel_id}") do
./lib/ret/discord_client.ex:212: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/members/#{discord_user_id}") do
./lib/ret/coturn.ex:21: {_, coturn_secret} = Cachex.fetch(:coturn_secret, :coturn_secret)
./lib/ret_web/controllers/api/v1/media_search_controller.ex:86: case Cachex.fetch(cache_for_query(query), query) do
./lib/ret_web/controllers/api/v1/media_controller.ex:78: value = Cachex.fetch(:media_urls, query) |
||
maybe_do_telemetry(value) | ||
maybe_bump_ttl(value, query) | ||
render_resolved_media_or_error(conn, value) | ||
end | ||
|
||
defp query_for(conn, url, version, quality) do | ||
quality = quality || default_quality(conn) | ||
|
||
ua = | ||
|
@@ -84,43 +92,12 @@ defmodule RetWeb.Api.V1.MediaController do | |
|
||
supports_webm = ua.family != "Safari" && ua.family != "Mobile Safari" | ||
|
||
query = %Ret.MediaResolverQuery{ | ||
%Ret.MediaResolverQuery{ | ||
url: url, | ||
supports_webm: supports_webm, | ||
quality: quality, | ||
version: version | ||
} | ||
|
||
case Cachex.fetch(:media_urls, query) do | ||
{_status, nil} -> | ||
Statix.increment("ret.media_resolver.404") | ||
conn |> send_resp(404, "") | ||
|
||
{_status, %Ret.ResolvedMedia{ttl: ttl} = resolved_media} -> | ||
if ttl do | ||
Cachex.expire(:media_urls, query, :timer.seconds(ttl / 1000)) | ||
end | ||
|
||
Statix.increment("ret.media_resolver.ok") | ||
render_resolved_media(conn, resolved_media) | ||
|
||
{:error, e} -> | ||
Statix.increment("ret.media_resolver.500") | ||
conn |> send_resp(500, e) | ||
|
||
_ -> | ||
Statix.increment("ret.media_resolver.500") | ||
conn |> send_resp(500, "Error resolving media") | ||
end | ||
end | ||
|
||
defp render_resolved_media(conn, %Ret.ResolvedMedia{uri: uri, audio_uri: audio_uri, meta: meta}) | ||
when audio_uri != nil do | ||
conn |> render("show.json", origin: uri |> URI.to_string(), origin_audio: audio_uri |> URI.to_string(), meta: meta) | ||
end | ||
|
||
defp render_resolved_media(conn, %Ret.ResolvedMedia{uri: uri, meta: meta}) do | ||
conn |> render("show.json", origin: uri |> URI.to_string(), meta: meta) | ||
end | ||
|
||
defp default_quality(conn) do | ||
|
@@ -136,4 +113,59 @@ defmodule RetWeb.Api.V1.MediaController do | |
:high | ||
end | ||
end | ||
|
||
defp maybe_do_telemetry({:commit, nil}), do: Statix.increment("ret.media_resolver.404") | ||
defp maybe_do_telemetry({:commit, %Ret.ResolvedMedia{}}), do: Statix.increment("ret.media_resolver.ok") | ||
defp maybe_do_telemetry({:error, _reason}), do: Statix.increment("ret.media_resolver.unknown_error") | ||
defp maybe_do_telemetry({:commit, :error}), do: Statix.increment("ret.media_resolver.500") | ||
defp maybe_do_telemetry({:commit, {:error, _reason}}), do: Statix.increment("ret.media_resolver.500") | ||
defp maybe_do_telemetry(_), do: nil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I only do telemetry on the |
||
|
||
defp maybe_bump_ttl({_status, %Ret.ResolvedMedia{ttl: ttl}}, query) do | ||
if ttl do | ||
Cachex.expire(:media_urls, query, :timer.seconds(ttl / 1000)) | ||
end | ||
end | ||
|
||
defp maybe_bump_ttl(_value, _query), do: nil | ||
|
||
defp render_resolved_media_or_error(conn, {_status, nil}) do | ||
send_resp(conn, 404, "") | ||
end | ||
|
||
defp render_resolved_media_or_error(conn, {_status, %Ret.ResolvedMedia{} = resolved_media}) do | ||
render_resolved_media(conn, resolved_media) | ||
end | ||
|
||
# This is an error response that we have cached ourselves | ||
defp render_resolved_media_or_error(conn, {_status, :error}) do | ||
send_resp(conn, 500, "An error occured during media resolution") | ||
end | ||
|
||
# This is an error response that we have cached ourselves | ||
defp render_resolved_media_or_error(conn, {_status, {:error, _reason}}) do | ||
send_resp(conn, 500, "An error occured during media resolution") | ||
end | ||
|
||
# This is an unexpected error response from Cachex | ||
defp render_resolved_media_or_error(conn, {:error, _reason}) do | ||
Statix.increment("ret.media_resolver.unknown_cachex_error") | ||
send_resp(conn, 500, "An unexpected error occurred during media resolution.") | ||
end | ||
|
||
# This is an unexpected response from Cachex | ||
defp render_resolved_media_or_error(conn, _) do | ||
# We do not expect this code to run, so if it happens, something went wrong | ||
Statix.increment("ret.media_resolver.unknown_error") | ||
send_resp(conn, 500, "An unexpected error occurred during media resolution.") | ||
end | ||
|
||
defp render_resolved_media(conn, %Ret.ResolvedMedia{uri: uri, audio_uri: audio_uri, meta: meta}) | ||
when audio_uri != nil do | ||
conn |> render("show.json", origin: uri |> URI.to_string(), origin_audio: audio_uri |> URI.to_string(), meta: meta) | ||
end | ||
|
||
defp render_resolved_media(conn, %Ret.ResolvedMedia{uri: uri, meta: meta}) do | ||
conn |> render("show.json", origin: uri |> URI.to_string(), meta: meta) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, but it seems like we need to spoof a User-Agent for certain sites to respond to the requests we make for screenshot / opengraph resolution (the ones with the
Range
headers below https://github.com/mozilla/reticulum/pull/452/files#diff-466e850938cfeaae016678e1ab2b328f907c2b5c7620a19ded2470dfad504b54R365 )There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane to spoof a normal browser, what specific string we use doesn't really matter much I don't think. Might want to use a Windows one in case the linux one confuses some random website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the linux one with a Windows + Firefox one:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0