Fallback to screenshots or opengraph if possible in media resolution #452

johnshaughnessy · 2021-01-12T00:25:52Z

johnshaughnessy · 2021-01-13T01:01:39Z

lib/ret/media_resolver.ex

+    #       twice in a row. These resolve functions can be simplified so that we can
+    #       more easily track individual failures and only fallback when necessary.
+    #       Also make sure they have a uniform response shape for indicating an
+    #       error.


This is a tech debt concern that I do not want to cover in this PR, (and I am not sure I'm right), but it might be worth changing in the future:

I think what we might want to do is:

Refactor resolve/2, resolve_with_ytdl, and resolve_non_video so that they return {:ok, %ResolvedMedia{}} | {:error, reason} instead of a return value for the Cachex fallback function.

In the outer function (resolve/1), match on the {:ok, %ResolvedMedia{}} | {:error, reason} tuples (from the functions above) using case to decide

whether to try to fallback to screenshots/opengraph,

whether to increment Statix metrics or other telemetry,

what we actually want to store in the cache in the event of an error (e.g. details about what went wrong so that we can return that info to the client)

From the outer function (resolve/1), return a tuple for the Cachex fallback function with a status of :commit or :ignore:

A fallback function should return a Tuple consisting of a :commit or :ignore tag and a value. If the Tuple is tagged :commit the value will be placed into the cache and then returned. If tagged :ignore the value will be returned without being written to the cache. If you return a value which does not fit this structure, it will be assumed that you are committing the value.

From https://hexdocs.pm/cachex/Cachex.html#fetch/4

This may not be necessary, but it seems to me like it might be clearer to explicitly return one of:

{:commit, resolved_media}

{:commit, {:error, reason}}

~~{:commit, nil}~~ (Don't use nil to indicate 404 -- instead use something like {:commit, {:error, 404}}

{:ignore, some_uncached_value}

We should do our best differentiate between the return values of Cachex.fetch(:media_urls, query) responses. Specifically,

case Cachex.fetch(:media_urls, query) do {:error, _} -> # This is a Cachex error! {_status, {:error, _reason} } -> # This is a cached error we put in ourselves end

See this commit for details ffab45c#diff-41751f7931f9a08bf162d96c927b4c6872c21b2d215916a5b0f8041848d08152R119

This might be easier to distinguish if we create a ResolvedMediaError struct to put in the cache so that we don't mistake the difference between the two.

This sounds correct to me. Mixing the fallback stuff in with resolving is confusing. We should have resolving return an ok/error result like you suggest, and then use that in cachex fallback. Might be nice for the cachex part not even to live in this file.

Captured this in #454 for addressing at a later date

johnshaughnessy · 2021-01-13T01:03:00Z

lib/ret_web/controllers/api/v1/media_controller.ex

+  defp maybe_do_telemetry({:error, _reason}), do: Statix.increment("ret.media_resolver.unknown_error")
+  defp maybe_do_telemetry({:commit, :error}), do: Statix.increment("ret.media_resolver.500")
+  defp maybe_do_telemetry({:commit, {:error, _reason}}), do: Statix.increment("ret.media_resolver.500")
+  defp maybe_do_telemetry(_), do: nil


I only do telemetry on the {:commit, _} tuples because that indicates when they are first entered into the cache. This way, we don't log 100 404's if there's 1 resource that 100 people try to reference because the media resolution only really failed once.

johnshaughnessy · 2021-01-13T01:05:40Z

lib/ret/http_utils.ex

@@ -24,6 +24,8 @@ defmodule Ret.HttpUtils do
  end

  def retry_until_success(verb, url, body \\ "", headers \\ [], cap_ms \\ 5_000, expiry_ms \\ 10_000) do
+    headers = headers ++ [{"User-Agent", "Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"}]


I don't think this is correct, but it seems like we need to spoof a User-Agent for certain sites to respond to the requests we make for screenshot / opengraph resolution (the ones with the Range headers below https://github.com/mozilla/reticulum/pull/452/files#diff-466e850938cfeaae016678e1ab2b328f907c2b5c7620a19ded2470dfad504b54R365 )

Seems sane to spoof a normal browser, what specific string we use doesn't really matter much I don't think. Might want to use a Windows one in case the linux one confuses some random website.

Replaced the linux one with a Windows + Firefox one:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0

netpro2k

lgtm

netpro2k · 2021-01-14T00:39:04Z

lib/ret/http_utils.ex

@@ -24,6 +24,8 @@ defmodule Ret.HttpUtils do
  end

  def retry_until_success(verb, url, body \\ "", headers \\ [], cap_ms \\ 5_000, expiry_ms \\ 10_000) do
+    headers = headers ++ [{"User-Agent", "Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"}]


Seems sane to spoof a normal browser, what specific string we use doesn't really matter much I don't think. Might want to use a Windows one in case the linux one confuses some random website.

netpro2k · 2021-01-14T00:51:24Z

lib/ret_web/controllers/api/v1/media_controller.ex

@@ -74,6 +74,14 @@ defmodule RetWeb.Api.V1.MediaController do
  end

  defp resolve_and_render(conn, url, version, quality \\ nil) do
+    query = query_for(conn, url, version, quality)
+    value = Cachex.fetch(:media_urls, query)


You could pipeline value through here (would need to change render_resolved_media_or_error)... Not sure if it would actually be clearer since value is not being transformed anyway...

This definitely reads more clearly than what we had before.

Also, we didn't have it before but might be worth adding a note that Cachex.fetch is what actually kicks off the resolution if its not in the cache.

RE: adding a note about Cachex.fetch.

I could be persuaded into adding a note (and I did not know this behavior before now either), but since it's used in several places in the client I am also OK with having people find out about this by consulting the docs. I don't feel strongly about this though so I'll add it if you think it's helpful

./lib/ret/slack_client.ex:78: case Cachex.fetch( ./lib/ret/slack_client.ex:93: case Cachex.fetch( ./lib/ret/slack_client.ex:105: case Cachex.fetch( ./lib/ret/app_config.ex:49: Cachex.fetch(:app_config, "") ./lib/ret/app_config.ex:117: case Cachex.fetch(:app_config_value, key) do ./lib/ret/app_config.ex:126: case Cachex.fetch(:app_config_owned_file_uri, key) do ./lib/ret/discord_client.ex:66: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/members/#{provider_account_id}") do ./lib/ret/discord_client.ex:71: case Cachex.fetch(:discord_api, "/users/#{provider_account_id}") do ./lib/ret/discord_client.ex:80: case Cachex.fetch(:discord_api, "/users/#{provider_account_id}") do ./lib/ret/discord_client.ex:138: case Cachex.fetch(:discord_api, "/guilds/#{community_id}") do ./lib/ret/discord_client.ex:146: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/roles") do ./lib/ret/discord_client.ex:172: case Cachex.fetch(:discord_api, "/channels/#{channel_id}") do ./lib/ret/discord_client.ex:212: case Cachex.fetch(:discord_api, "/guilds/#{community_id}/members/#{discord_user_id}") do ./lib/ret/coturn.ex:21: {_, coturn_secret} = Cachex.fetch(:coturn_secret, :coturn_secret) ./lib/ret_web/controllers/api/v1/media_search_controller.ex:86: case Cachex.fetch(cache_for_query(query), query) do ./lib/ret_web/controllers/api/v1/media_controller.ex:78: value = Cachex.fetch(:media_urls, query)

netpro2k · 2021-01-14T00:54:11Z

lib/ret/media_resolver.ex

+    #       twice in a row. These resolve functions can be simplified so that we can
+    #       more easily track individual failures and only fallback when necessary.
+    #       Also make sure they have a uniform response shape for indicating an
+    #       error.


This sounds correct to me. Mixing the fallback stuff in with resolving is confusing. We should have resolving return an ok/error result like you suggest, and then use that in cachex fallback. Might be nice for the cachex part not even to live in this file.

johnshaughnessy added 2 commits January 11, 2021 15:51

Refactor resolve_and_render

8181a0e

Fallback to screenshots / opengraph

37aa259

johnshaughnessy changed the title ~~Feature/media resolve fallback to screenshots~~ Fallback to screenshots or opengraph if possible in media resolution Jan 12, 2021

johnshaughnessy added 8 commits January 11, 2021 16:32

Fix compilation warning: default arguments are never used

a5723da

Match :error tuples more carefully

ffab45c

lint

569c8ea

lint...

96379dc

Cache the errors as :error atoms

4e769e9

My local linter is not showing me errors... :(

38870cd

Remove an unnecessary branch

823511f

Send fake user-agent to fix range queries

f94d9e8

johnshaughnessy commented Jan 13, 2021

View reviewed changes

netpro2k approved these changes Jan 14, 2021

View reviewed changes

Use user agent from windows firefox

0c0d9aa

johnshaughnessy mentioned this pull request Jan 15, 2021

Separate media resolution attempts from Cachex behavior #454

Open

lint

95a17a7

johnshaughnessy merged commit 4630a3d into master Jan 15, 2021

johnshaughnessy deleted the feature/media-resolve-fallback-to-screenshots branch January 15, 2021 21:33

This was referenced Jan 15, 2021

Investigate specific failures of media resolution #455

Open

Media resolver suggestion: screenshot fallback for failed media resolution #435

Closed

This was referenced Mar 5, 2021

April 2021 Hubs Cloud Update #479

Merged

April 2021 Hubs Cloud Update #482

Merged

brianpeiris mentioned this pull request Nov 16, 2021

Fix Discord integration #531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to screenshots or opengraph if possible in media resolution #452

Fallback to screenshots or opengraph if possible in media resolution #452

johnshaughnessy commented Jan 12, 2021 •

edited

Loading

johnshaughnessy Jan 13, 2021

netpro2k Jan 14, 2021

johnshaughnessy Jan 15, 2021

johnshaughnessy Jan 13, 2021 •

edited

Loading

johnshaughnessy Jan 13, 2021

netpro2k Jan 14, 2021

johnshaughnessy Jan 15, 2021

netpro2k left a comment

netpro2k Jan 14, 2021

netpro2k Jan 14, 2021

johnshaughnessy Jan 15, 2021

netpro2k Jan 14, 2021

Fallback to screenshots or opengraph if possible in media resolution #452

Fallback to screenshots or opengraph if possible in media resolution #452

Conversation

johnshaughnessy commented Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnshaughnessy Jan 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netpro2k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnshaughnessy commented Jan 12, 2021 •

edited

Loading

johnshaughnessy Jan 13, 2021 •

edited

Loading