web-archives

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

spark python3 notebooks web-archives pyspark-notebook juypter-notebook

Updated Dec 5, 2022
Jupyter Notebook

N0taN3rd / node-warc

Star

Parse And Create Web ARChive (WARC) files with node.js

warc web-archiving webarchive web-archives webarchiving warc-files chrome-remote-interface pupeteer

Updated Jan 3, 2023
JavaScript

caltechlibrary / eprints2archives

Star

Send records from an EPrints server to the Internet Archive and other web archives

python terminal archiving internet-archive memento web-archiving preservation web-archives eprints

Updated May 15, 2023
Python

sebastian-nagel / warc-crawler

Star

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

elasticsearch solr apache-storm warc web-archives warc-files stormcrawler

Updated Nov 24, 2023
FLUX

ukwa / ukwa-ui

Star

A new user interface for the UK Web Archive

web-archiving web-archives

Updated Apr 10, 2024
Java

wsdookadr / warctools

Star

warc tools allowing joining, finding missing resources, fetching missing resources, accessing metadata, conversion to zim and offline viewing for web archives

offline http-archive warc zim metadata-extraction web-archives

Updated Aug 4, 2024
Python

Improve this page

Add a description, image, and links to the web-archives topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archives topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-archives

Here are 27 public repositories matching this topic...

bhouston1982 / staticPages-webArchives

web-archive-group / wadl2017

N0taN3rd / node-cdxj

helgeho / Tempas2ArchiveSpark

oduwsdl / offtopic-goldstandard-data

ukwa / waybacks

hrbrmstr / cdx

k12stemaker / k12stemaker.github.io

tigercosmos / web-archives

nchylak / capstone-project

oduwsdl / MementoEmbed

oduwsdl / raintale

lanl / Zotero-Robust-Links-Extension

ukwa / ukwa-gsheets-utils

archivesunleashed / notebooks

N0taN3rd / node-warc

caltechlibrary / eprints2archives

sebastian-nagel / warc-crawler

ukwa / ukwa-ui

wsdookadr / warctools

Improve this page

Add this topic to your repo