You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is done in extract_info_from_video_page function in scraper.py.
We currently have few recipes intermittently failing with an error An error occurred: 'NoneType' object has no attribute 'string'.
Looking at HTML content, there is no __NEXT_DATA__ JSON inside the page.
Loading again the page on my machine, there is __NEXT_DATA__ JSON.
So clearly the scraper should be more resilient to intermittent bad responses from TED server.
This was indeed the case in 2.10.0 where there was a retry logic in extract_info_from_video_page and got dropped in https://github.com/openzim/ted/pull/130/files when adapting to new DOM.
I think we should just restore this functionality by again pausing 5 secs and trying again up to 5 times, just like in 2.10.0.
The text was updated successfully, but these errors were encountered:
Since we have currently no plan on when we will be able to work on 3.1.0 and since this bug makes the success of https://farm.openzim.org/recipes/ted_topic_all mostly impossible, I'm going to make a patch release 3.0.3
In order to retrieve video infos, TED scraper retrieves the video page with a URL like https://ted.com/talks/franco_sacchi_a_tour_of_nollywood_nigeria_s_booming_film_industry?language=nl and will look for
__NEXT_DATA__
JSON inside the page, where it will find among other things the localized title and description.This is done in
extract_info_from_video_page
function inscraper.py
.We currently have few recipes intermittently failing with an error
An error occurred: 'NoneType' object has no attribute 'string'
.Looking at HTML content, there is no
__NEXT_DATA__
JSON inside the page.Loading again the page on my machine, there is
__NEXT_DATA__
JSON.So clearly the scraper should be more resilient to intermittent bad responses from TED server.
This was indeed the case in 2.10.0 where there was a retry logic in
extract_info_from_video_page
and got dropped in https://github.com/openzim/ted/pull/130/files when adapting to new DOM.I think we should just restore this functionality by again pausing 5 secs and trying again up to 5 times, just like in 2.10.0.
The text was updated successfully, but these errors were encountered: