Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https://freebies.indiegala.com doesn't show any results from Sopel bot. #1895

Closed
antdude opened this issue Jun 23, 2020 · 13 comments
Closed
Labels
Not Us Issues that are not Sopel's responsibility, e.g. a bug in the environment or a dependency

Comments

@antdude
Copy link

antdude commented Jun 23, 2020

Description

https://freebies.indiegala.com doesn't show any results from Sopel bot.

Reproduction steps

Copy and paste https://freebies.indiegala.com URL into IRC channel with a Sopel bot.

Expected behavior

I should see Sopel visit it and give me details.

Logs

Nothing interesting in my logs.

Environment

  • v7.0.4 (Python 3.4.2)
  • pip
  • Python v3.4.2
  • Debian Jessie v8
  • /version:
    ��� BitchX: Client: BitchX-1.2.1 (internal version 20141114)
    ��� ircd-ratbox-3.0.8(20121231_3-27427).: irc1.zimage.com eGHIKMpZ6 TS6ow 1ZI
    ��� CHANTYPES=&# EXCEPTS INVEX CHANMODES=eIb,k,l,imnpstS CHANLIMIT=&#:15
    PREFIX=(ov)@+ MAXLIST=beI:25 MODES=4 NETWORK=zImage KNOCK STATUSMSG=@+
    CALLERID=g :are supported by this server
    ��� SAFELIST ELIST=U CASEMAPPING=rfc1459 CHARSET=ascii NICKLEN=32 CHANNELLEN=50
    TOPICLEN=160 ETRACE CPRIVMSG CNOTICE DEAF=D MONITOR=60 are supported by
    this server
    ��� FNC TARGMAX=NAMES:1,LIST:1,KICK:1,WHOIS:1,PRIVMSG:4,NOTICE:4,ACCEPT:,MONITOR:
    :are supported by this server

Notes

https://www.indiegala.com/ worked though:
[02:51pm] Ant> https://www.indiegala.com/
02:52PM URL> [ IndieGala | Buy PC Games, Steam Keys, Bundles, Steam downloads ] -
www.indiegala.com

Not sure if it is my very old setup or an actual bug in Sopel with this URL. Thank you for reading. :)

@antdude antdude added the Bug Things to squish; generally used for issues label Jun 23, 2020
@RhinosF1
Copy link
Contributor

I get no response to either URL on my setup. GitHub urls work fine.

I don't think the urls module is doing anything magical though.

@dgw
Copy link
Member

dgw commented Jun 23, 2020

Please test current master. Title fetching has been modified since 7.0.4.

@RustyBower
Copy link
Contributor

RustyBower commented Jun 23, 2020 via email

@antdude
Copy link
Author

antdude commented Jun 23, 2020

If that is the case, then Sopel should retry or give a time out error?

@dgw
Copy link
Member

dgw commented Jun 23, 2020

I'm gonna say this isn't our problem. We can't force any site to send Sopel clean HTML if it doesn't want to.

Title fetching is a best-effort feature, and we know there are always going to be factors outside Sopel's control that can affect it. Maybe the IP range a bot connects from is banned due to DDoS protection. Maybe the remote site uses heuristics to detect automated requests (again, for DDoS protection). There are so many variables, and in most cases there's literally nothing Sopel could do short of running a whole browser emulation à la Selenium—way overkill for a convenience feature like this. (There's nothing stopping anyone from implementing an alternative title fetching plugin that uses browser emulation, anyway. We just won't include it in core because of the heavy dependencies.)

@dgw dgw added Not Us Issues that are not Sopel's responsibility, e.g. a bug in the environment or a dependency and removed Bug Things to squish; generally used for issues labels Jun 23, 2020
@dgw dgw closed this as completed Jun 23, 2020
@dgw
Copy link
Member

dgw commented Jun 23, 2020

@antdude Further IRC discussion led us to find that the error Sopel used to output when .title is used (not automatic title fetching) no longer happens since #1510. So, we'll have a regression fix for 7.1 🤞

@antdude
Copy link
Author

antdude commented Jun 27, 2020

Is this title issue the same for http://flickmetrix.com web site too?

@dgw
Copy link
Member

dgw commented Jun 27, 2020

No. That site just plain doesn't include a <title> element in the HTML at all, because it's lazy or something (snark level: DONE with websites and their shit).

We talked on IRC about adding support for meta tags, but even that won't really help in this case. Even though it does exist, the title meta tag on this site is not customized for the page or item requested. Every page will show the same generic title as the homepage.

@antdude
Copy link
Author

antdude commented Jun 27, 2020

Thank you for the quick answer, dwg. Wow, no old fashion HTML's title tags. How are the web browsers adding titles?!

@dgw
Copy link
Member

dgw commented Jun 27, 2020

JavaScript. Read the page source yourself.

@antdude
Copy link
Author

antdude commented Jun 27, 2020

I don't know JS. :(

@cottongin
Copy link
Contributor

cottongin commented Jun 27, 2020

If anyone else reading this was/is interested in how this particular site is generating document/page titles for individual links like I was:

Excerpt from part of their js

    q.openSingleFilm = function(F) {
        var E = $(document).prop("title");
        var D = f.path();
        var G = d.director;
        var H = d.cast;
        f.search("director", null);
        f.search("cast", null);
        $(document).prop("title", F.Title + " - Is " + F.Title + " On Netflix?");
        $("meta[name=description]").remove();
        $("head").append('<meta name="description" content="Find out if ' + F.Title + " is on Netflix, and where else " + F.Title + ' is streaming online now.">');
        f.path("/", false);
        f.search("id", F.ID);
        var a = $(window).scrollTop();
        if (q.isMobile) {
            $("html").css("overflow-y", "hidden")
        }
        var v = j.open({
            animation: false,
            templateUrl: "templates/singleFilm.html?45512777",
            controller: "singleFilmController",
            windowClass: "singleFilmModal",
            size: "lg",
            resolve: {
                params: function() {
                    return {
                        film: F,
                        films: q.films,
                        updateWatch: q.updateWatch,
                        updateLike: q.updateLike,
                        updateRating: q.updateRating,
                        updateFavourite: q.updateFavourite,
                        updateSeen: q.updateSeen,
                        updateRecommend: q.updateRecommend,
                        openTrailer: q.openTrailer,
                        getAmazonLink: q.getAmazonLink,
                        getFilms: q.getFilms,
                        searchFilmsByDirector: q.searchFilmsByDirector,
                        searchFilmsByCast: q.searchFilmsByCast,
                        netflixRegion: q.filter.netflixRegion,
                        showLoginPrompt: q.showLoginPrompt
                    }
                }
            }
        });
        v.result["catch"](function() {
            f.search("id", null);
            f.search("director", G);
            f.search("cast", H);
            f.path(D, false);
            $(document).prop("title", E);
            $("html").css("overflow-y", "scroll");
            if (q.isMobile) {
                $("body").addClass("forcedRelative");
                $(window).scrollTop(a);
                setTimeout(function() {
                    $("body").removeClass("forcedRelative")
                }, 1000)
            }
        });
        v.result.then(function() {
            f.search("id", null);
            f.search("director", G);
            f.search("cast", H);
            f.path(D, false);
            $(document).prop("title", E);
            $("html").css("overflow-y", "scroll");
            if (q.isMobile) {
                $("body").addClass("forcedRelative");
                $(window).scrollTop(a);
                setTimeout(function() {
                    $("body").removeClass("forcedRelative")
                }, 1000)
            }
        });
        g.showAppFilm = true
    }

Also as @dgw pointed out, my suggestion for adding <meta> tag processing in #1896 wouldn't work because of the js snippet above, but their site does include a <meta property="og:title"> tag and perhaps it is better to return some title than no title at all.

Nothing precisely specific to this issue, but musings that I thought should be recorded for prosperity. Will collate all of these thoughts into my PR Soon™

Edit: To answer @antdude's specific question about how browsers are getting a title from this specific site: they are setting the title via javascript document.title which obviously doesn't get reflected in the raw html and thus Sopel can't find a title to parse.

@dgw
Copy link
Member

dgw commented Jun 27, 2020

@cottongin I guess now we know why Google decided to just add JS execution to its crawler… Adding a meta tag with JavaScript, wow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Not Us Issues that are not Sopel's responsibility, e.g. a bug in the environment or a dependency
Projects
None yet
Development

No branches or pull requests

5 participants