1 repository on SrcLog
Modular R web-scraping framework that crawls sitemaps, aggregates links by date range, and extracts target HTML fields using the paperboy package (German newspapers)