Skip to content

Check Pinboard links for link rot and update dead links with an archive.org snapshot.

Notifications You must be signed in to change notification settings

Fackelmann/PBLCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pinboard Link Checker and Archiver (PBLCA)

Lately I have become increasingly worried about the problem of link rot. I've been using Pinboard for more than 5 years, and I have over 1200 bookmarks, so out of curiosity I decided to check how many of these links were dead. The result was that more than 5% of the links didn't exist any more, or redirected to a 403 page. Since I have been continously adding bookmarks over this time, this means that the link rot rate is (significantly) larger than 5% per 5 years.

I've also added the option to look for the closest snapshot (to the bookmark creation date) of the dead link in the internet archive and update the bookmark to redirect to it. If no snapshot exists, the script will ask you whether you want to delete the bookmark or keep it.

PBLCA uses multiprocessing to query all your bookmarks, so it should be relatively fast. Checking 1250 of my bookmarks takes around 6 minutes.

Usage

Clone the repo and cd to the folder:

git clone https://github.com/Fackelmann/PBLCA
cd PBLCA

Create the poetry virtual environment:

poetry update

And run it, providing your Pinboard API token

poetry run pblca --token USERNAME:API_TOKEN

If you don't have poetry installed, you'll need to install it first:

pip3 install poetry

Testing

To run pytest, go to the top level directory and run:

make pytest

and

make mypy

You will need to create config.py under tests with your Pinboard API token:

config.py

VALID_TOKEN = USERNAME:TOKEN
INVALID_TOKEN = AN_INVALID:TOKEN

TODO

  • Add options for batch processing

Known issues

  • There is no real way to update a bookmark via the Pinboard API, as the URL is the key. PBLCA will create a new bookmark with the same attributes (including creation date), and delete the old one. Not a real issue from a functionality perspective, but worth mentioning.
  • A (very) few bookmarks will show up as dead even though you can still access the page with your browser. It seems to be an issue with the headers.

Disclaimer

  • Please use at your own risk. ALWAYS have a backup of your data.

About

Check Pinboard links for link rot and update dead links with an archive.org snapshot.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published