Public Data
🦥
Common Crawl CDX API
🦔
Common Crawl files
WARCs, CDX files, parquet...
⛵
End of Term Archive
WARCs, CDX files, parquet url index
🌍
Internet Archive Wayback
📚
UK Government Web Archive
Wayback
🦥
Webrecorder US GovArchive
high-fidelity replay
Tools & Software
🪘
ArchiveBox
A tool which maintains an...
⭐ 27k | 👀 183
🦥
SingleFile
Browser extension for...
⭐ 20k | 👀 138
📚
FlameShot
Screen capture and...
⭐ 29k | 👀 212
🎮
warcbench
A tool for exploring,...
⭐ 13 | 👀 2
🏓
warctools
Library to work with ARC...
⭐ 171 | 👀 38
🪘
Internet Archive Library
A command line tool and...
⭐ 1k | 👀 54
In Development
🎨
ArchiveBox
A tool which maintains an...
⭐ 27k | 👀 183
🐆
DiskerNet
A non-WARC-based tool...
⭐ 3k | 👀 38
🪘
playback
A toolkit for searching...
⭐ 13 | 👀 2
📦
Squidwarc
An
⭐ 175 | 👀 9
🐀
duckdb-web-archive-cdx
DuckDB extension to query...
📦
Warcat-rs
Command-line tool and Rust...
⭐ 30 | 👀 1
Stable
🪘
SingleFile
Browser extension for...
⭐ 20k | 👀 138
🛰️
Internet Archive Library
A command line tool and...
⭐ 1k | 👀 54
⛵
monolith
CLI tool to save a web...
⭐ 15k | 👀 65
🌍
hyphe
A webcrawler built for...
⭐ 378 | 👀 28
🌍
Wayback
A toolkit for snapshot...
⭐ 2k | 👀 7
🛸
OutbackCDX
RocksDB-based capture...
⭐ 42 | 👀 19