Building a graph of the Internet, one button at a time. Website available here.
This project crawls the links between 88x31s on the Internet, which are small badges on websites that link to other websites. It's split into three projects:
- A host server (Rust) that manages work between scraper nodes and talks to a Redis database using axum and fred
- A scraper (Rust) that talks to the server, fetches URLs, and returns information using scraper
- A web app (TypeScript/React) to render the graph using Cosmograph
Websites are scanned for images, and images that match the 88x31 resolution and link to another site are logged. It respects robots.txt and is aware of redirects.
Scrapers can either run a WebDriver or just parse the HTML - note that the latter will break discovery for websites that use JavaScript to create the button elements (e.g. React apps).
The scrapers respect robots.txt, so block this user agent (or allow if you want to opt in):
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 eightyeightthirtyone/1.0.0 (https://github.com/NotNite/eightyeightthirtyone)
Note that this only somewhat works:
- You will still appear on the graph if anyone else links to you.
- Previous entries in the database are not deleted - please email me your domain and affected URLs if known (see Credits & contact).
This project wouldn't be possible without the following people:
-
- for starting the project, hosting the website and server
-
- for her help on the frontend, designing the project's 88x31, hosting a scraper node
-
- for her help designing the Redis database schema, contributions to the server
-
- for lending us a server to host a scraper node on
If you have any questions or concerns, you can send an email to me (NotNite), which is available on my website (not written here for spam concerns).
Thanks!