-
-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://freebies.indiegala.com doesn't show any results from Sopel bot. #1895
Comments
I get no response to either URL on my setup. GitHub urls work fine. I don't think the urls module is doing anything magical though. |
Please test current master. Title fetching has been modified since 7.0.4. |
It looks like this is just timing out on my testing as well. My guess is
they have some sort of DDoS protection/bot detection, and drop those
connections on the floor.
|
If that is the case, then Sopel should retry or give a time out error? |
I'm gonna say this isn't our problem. We can't force any site to send Sopel clean HTML if it doesn't want to. Title fetching is a best-effort feature, and we know there are always going to be factors outside Sopel's control that can affect it. Maybe the IP range a bot connects from is banned due to DDoS protection. Maybe the remote site uses heuristics to detect automated requests (again, for DDoS protection). There are so many variables, and in most cases there's literally nothing Sopel could do short of running a whole browser emulation à la Selenium—way overkill for a convenience feature like this. (There's nothing stopping anyone from implementing an alternative title fetching plugin that uses browser emulation, anyway. We just won't include it in core because of the heavy dependencies.) |
Is this title issue the same for http://flickmetrix.com web site too? |
No. That site just plain doesn't include a We talked on IRC about adding support for meta tags, but even that won't really help in this case. Even though it does exist, the title meta tag on this site is not customized for the page or item requested. Every page will show the same generic title as the homepage. |
Thank you for the quick answer, dwg. Wow, no old fashion HTML's title tags. How are the web browsers adding titles?! |
JavaScript. Read the page source yourself. |
I don't know JS. :( |
If anyone else reading this was/is interested in how this particular site is generating document/page titles for individual links like I was: Excerpt from part of their js
q.openSingleFilm = function(F) {
var E = $(document).prop("title");
var D = f.path();
var G = d.director;
var H = d.cast;
f.search("director", null);
f.search("cast", null);
$(document).prop("title", F.Title + " - Is " + F.Title + " On Netflix?");
$("meta[name=description]").remove();
$("head").append('<meta name="description" content="Find out if ' + F.Title + " is on Netflix, and where else " + F.Title + ' is streaming online now.">');
f.path("/", false);
f.search("id", F.ID);
var a = $(window).scrollTop();
if (q.isMobile) {
$("html").css("overflow-y", "hidden")
}
var v = j.open({
animation: false,
templateUrl: "templates/singleFilm.html?45512777",
controller: "singleFilmController",
windowClass: "singleFilmModal",
size: "lg",
resolve: {
params: function() {
return {
film: F,
films: q.films,
updateWatch: q.updateWatch,
updateLike: q.updateLike,
updateRating: q.updateRating,
updateFavourite: q.updateFavourite,
updateSeen: q.updateSeen,
updateRecommend: q.updateRecommend,
openTrailer: q.openTrailer,
getAmazonLink: q.getAmazonLink,
getFilms: q.getFilms,
searchFilmsByDirector: q.searchFilmsByDirector,
searchFilmsByCast: q.searchFilmsByCast,
netflixRegion: q.filter.netflixRegion,
showLoginPrompt: q.showLoginPrompt
}
}
}
});
v.result["catch"](function() {
f.search("id", null);
f.search("director", G);
f.search("cast", H);
f.path(D, false);
$(document).prop("title", E);
$("html").css("overflow-y", "scroll");
if (q.isMobile) {
$("body").addClass("forcedRelative");
$(window).scrollTop(a);
setTimeout(function() {
$("body").removeClass("forcedRelative")
}, 1000)
}
});
v.result.then(function() {
f.search("id", null);
f.search("director", G);
f.search("cast", H);
f.path(D, false);
$(document).prop("title", E);
$("html").css("overflow-y", "scroll");
if (q.isMobile) {
$("body").addClass("forcedRelative");
$(window).scrollTop(a);
setTimeout(function() {
$("body").removeClass("forcedRelative")
}, 1000)
}
});
g.showAppFilm = true
} Also as @dgw pointed out, my suggestion for adding Nothing precisely specific to this issue, but musings that I thought should be recorded for prosperity. Will collate all of these thoughts into my PR Soon™ Edit: To answer @antdude's specific question about how browsers are getting a title from this specific site: they are setting the title via javascript |
@cottongin I guess now we know why Google decided to just add JS execution to its crawler… Adding a meta tag with JavaScript, wow |
Description
https://freebies.indiegala.com doesn't show any results from Sopel bot.
Reproduction steps
Copy and paste https://freebies.indiegala.com URL into IRC channel with a Sopel bot.
Expected behavior
I should see Sopel visit it and give me details.
Logs
Environment
��� BitchX: Client: BitchX-1.2.1 (internal version 20141114)
��� ircd-ratbox-3.0.8(20121231_3-27427).: irc1.zimage.com eGHIKMpZ6 TS6ow 1ZI
��� CHANTYPES=&# EXCEPTS INVEX CHANMODES=eIb,k,l,imnpstS CHANLIMIT=&#:15
PREFIX=(ov)@+ MAXLIST=beI:25 MODES=4 NETWORK=zImage KNOCK STATUSMSG=@+
CALLERID=g :are supported by this server
��� SAFELIST ELIST=U CASEMAPPING=rfc1459 CHARSET=ascii NICKLEN=32 CHANNELLEN=50
TOPICLEN=160 ETRACE CPRIVMSG CNOTICE DEAF=D MONITOR=60 are supported by
this server
��� FNC TARGMAX=NAMES:1,LIST:1,KICK:1,WHOIS:1,PRIVMSG:4,NOTICE:4,ACCEPT:,MONITOR:
:are supported by this server
Notes
https://www.indiegala.com/ worked though:
[02:51pm] Ant> https://www.indiegala.com/
02:52PM URL> [ IndieGala | Buy PC Games, Steam Keys, Bundles, Steam downloads ] -
www.indiegala.com
Not sure if it is my very old setup or an actual bug in Sopel with this URL. Thank you for reading. :)
The text was updated successfully, but these errors were encountered: