Most popular crawling repositories and open source projects

scrapy-mcp-server scrapoxy

MCP server that enables self-healing automatic repair of Scrapy spiders. When websites change, your scrapers fix themselves.

17 1 17

CSCI572-Information_Retrieval_And_Web_Search_Engines Keerthivasan13 Java

Search Engine projects

17 17 17

pyReptile xyjw Python

web crawling & scraping framework for Python

16 7 16

node-vgmusic-downloader gogson JavaScript

Node.js tool for downloading all free MIDI files on VGMusic.com

16 5 16

free-llmstxt-generator moinulmoin TypeScript

converts webpage content into Markdown format, optimized for LLM training and context

16 1 16

browser-agent lightfeed TypeScript

Replayable Browser Agent

16 0 16

WebSearch iTeam-S Python

Python module allowing you to do various searches for links on the Web.

16 8 16

crawlly gaurav-gogia Go

A simple web crawller in go

15 10 15

kasthack.osp kasthack-labs C#

Генератор сырых дампов пользователей VK.

15 5 15

facebook-scraper-for-non-english-user dizwe Python

crawling facebok page

15 4 15

pumba sultaniman Elixir

Fetch, store and access user agent strings for different browsers

15 1 15

twitter-account-data-crawler somniLegacy Python

Crawl and track followers count of Twitter account

15 2 15

re-employment-kraken uschtwill JavaScript

re-employment-kraken scrapes (job) sites, remembers what it saw and notifies downstream systems of any new sightings.

15 1 15

darklight bunseokbot Python

Engine for collecting onion domains and crawling from webpage based on Tor network

15 4 15

velog-dashboard Check-Data-Out JavaScript

2023.11) velog statistics dashboard fullstack

15 1 15

img-cli selmi-karim JavaScript

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

14 3 14

nutch-solr-integration basraven

An ultra small PoC to show how to combine Apache Nutch and Apache Solr, crawling through web pages and storing the results in Solr for quering

14 10 14

omkar-temp-mail omkarcloud Python

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

14 4 14

React-YouTube-Comment-Section-Scraper MikeM711 JavaScript

A full stack application that scrapes & filters YouTube comments using Google's Puppeteer, instead of using the YouTube API

13 6 13

openclaw-ultra-scraping LeoYeAI Python

🕷️ Adaptive web scraping skill for OpenClaw agents — bypasses anti-bot, survives site redesigns. Powered by MyClaw.ai

13 5 13

firecrawler sammcj JavaScript

A lightweight frontend for self-hosted Firecrawl instances

13 12 13

wikipedia-externallinks-fast-extraction lovasoa Rust

Fast extraction of all external links from wikipedia

13 3 13

Arachnida guillim JavaScript

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

13 12 13

Octocrawl b3rt1ng Python

Fast, parallel and easy to use web crawler for penetration testing and bug bounty

13 1 13

FinalProject-Datascience HungTrinhIT Jupyter Notebook

Đồ án cuối kì môn khoa học dữ liệu ứng dụng. Thu thập data bằng cách parsing HTML và sử dụng các mô hình học máy để giải quyết câu hỏi được đặt ra ba...

13 6 13

Web-Crawling-Stock-Data- Ariannahs Jupyter Notebook

东方财富网股票数据爬取

13 6 13

Cross-The-Floor drkostas HTML

Uses Sankey Diagrams to visualize politicians that have "crossed the floor" from election to election.

13 0 13

house-bob peterwade153 Python

A django application for scraping properties with scrapy.

13 7 13

smd adbenitez Python

Simple Manga Downloader, a tool to search and download manga

12 0 12

insta-downloader amirzenoozi Python

You Can Download Instagram Post With This Script

12 3 12

googlescholar-crawler debanjanmahata

This is a crawler for crawling papers from google scholar (http://scholar.google.com). Credits for this code goes to (https://github.com/ckreibich/sch...

12 0 12

TGCrawl Puzzaks Dart

Telegram channel relations analyzer

12 3 12

crawl-tiki-products chidokun Python

Demonstration for crawling Laptop products on Tiki ecomercial website

12 10 12

route-waypoints hkbus Python

Crawling route waypoints for HK bus routes

12 12 12

oda jens-ox TypeScript

Extraction, versioning and machine-readable provisioning of public data.

12 0 12

node-raspar kodjunkie JavaScript

🕷️ Easily scrap the web for torrent and media files.

12 5 12

SECTOOL orangmuda Shell

sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)

12 4 12

crawler-ts supergillis TypeScript

Crawler written in TypeScript using ES6 generators.

12 1 12

mb-checker juvalen Python

Python scripts, first traverses chrome Bookmark file and second removes stale entries. Includes Jenkinsfile to generate docker images.

12 0 12

clever_searcher Azzedde Python

Intelligent web discovery agent with LLM-powered planning, multi-source search, smart deduplication, and GRPO preference dataset collection. Autonomou...

12 0 12

fundus-evaluation dobbersc Python

[ACL 2024] Evaluation of the Fundus News Scraper

12 1 12

shopee-crawler hoangkimminh JavaScript

Simple scripts for crawling shopee's shop and product information from shopee.vn

12 9 12

WeiboStockAnalysisCharts rubinliudongpo HTML

crawling china stock recommendation from Sina Weibo, create pyecharts for data

11 3 11

data_camp_wcr Beomi Python

파이썬을 활용한 실전 웹크롤링 CAMP 강의 1-2기 소스코드

11 6 11

newscorpus gambolputty Python

Docker🐳 setup for automated news article crawling from German news websites. Written in Python🐍, uses MongoDB

11 2 11

Quotes-Crawling ParhamPishro Jupyter Notebook

Crawl Anne Shirley's Quotes from Web | استخراج نقل قول های آن شرلی از وب

11 0 11

dropship-trend-crawler nabz0r JavaScript

A sophisticated data-driven system that revolutionizes product discovery for dropshipping businesses. Unlike traditional web crawlers, this platform l...

11 2 11

ResearchGateCrawler SMSadegh19 Python

Python script for crawling ResearchGate.net papers.✨⭐️📎

11 0 11

tarantula rly0nheart Python

Python web crawler tool

11 3 11

Scraping-IMDB RaedAddala Jupyter Notebook

This Python script extracts comprehensive movie data from IMDB, focusing on top-grossing movies from 1920 to 2025. The scraper collects detailed infor...

11 4 11

crawling

Repositories (1350)