Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter-out non-public videos and properly cleanup unsuccessful videos #363

Merged
merged 1 commit into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Raise exception if there are no videos in the playlists (#347)

### Fixed

- Filter-out non-public videos and properly cleanup unsuccessful videos (#362)

## [3.2.0] - 2024-10-11

### Deprecated
Expand Down
9 changes: 5 additions & 4 deletions scraper/src/youtube2zim/scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@
get_videos_json,
save_channel_branding,
skip_deleted_videos,
skip_non_public_videos,
skip_outofrange_videos,
)

Expand Down Expand Up @@ -611,6 +612,7 @@
)
filter_videos = filter(skip_outofrange, videos_json)
filter_videos = filter(skip_deleted_videos, filter_videos)
filter_videos = filter(skip_non_public_videos, filter_videos)

Check warning on line 615 in scraper/src/youtube2zim/scraper.py

View check run for this annotation

Codecov / codecov/patch

scraper/src/youtube2zim/scraper.py#L615

Added line #L615 was not covered by tests
all_videos.update(
{v["contentDetails"]["videoId"]: v for v in filter_videos}
)
Expand Down Expand Up @@ -1038,10 +1040,9 @@
def make_json_files(self, actual_videos_ids):
"""Generate JSON files to be consumed by the frontend"""

def remove_unused_videos(videos):
video_ids = [video["contentDetails"]["videoId"] for video in videos]
def remove_unused_videos():

Check warning on line 1043 in scraper/src/youtube2zim/scraper.py

View check run for this annotation

Codecov / codecov/patch

scraper/src/youtube2zim/scraper.py#L1043

Added line #L1043 was not covered by tests
for path in self.videos_dir.iterdir():
if path.is_dir() and path.name not in video_ids:
if path.is_dir() and path.name not in actual_videos_ids:
logger.debug(f"Removing unused video {path.name}")
shutil.rmtree(path, ignore_errors=True)

Expand Down Expand Up @@ -1282,7 +1283,7 @@
)

# clean videos left out in videos directory
remove_unused_videos(videos)
remove_unused_videos()

Check warning on line 1286 in scraper/src/youtube2zim/scraper.py

View check run for this annotation

Codecov / codecov/patch

scraper/src/youtube2zim/scraper.py#L1286

Added line #L1286 was not covered by tests

def add_file_to_zim(
self,
Expand Down
7 changes: 6 additions & 1 deletion scraper/src/youtube2zim/youtube.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@
PLAYLIST_ITEMS_API,
params={
"playlistId": playlist_id,
"part": "snippet,contentDetails",
"part": "snippet,contentDetails,status",
"key": YOUTUBE.api_key,
"maxResults": RESULTS_PER_PAGE,
"pageToken": page_token,
Expand Down Expand Up @@ -309,6 +309,11 @@
)


def skip_non_public_videos(item):

Check warning on line 312 in scraper/src/youtube2zim/youtube.py

View check run for this annotation

Codecov / codecov/patch

scraper/src/youtube2zim/youtube.py#L312

Added line #L312 was not covered by tests
"""filter func to filter-out non-public videos"""
return item["status"]["privacyStatus"] == "public"

Check warning on line 314 in scraper/src/youtube2zim/youtube.py

View check run for this annotation

Codecov / codecov/patch

scraper/src/youtube2zim/youtube.py#L314

Added line #L314 was not covered by tests


def skip_outofrange_videos(date_range, item):
"""filter func to filter-out videos that are not within specified date range"""
return dt_parser.parse(item["snippet"]["publishedAt"]).date() in date_range
Expand Down