Topic

crawling

Repositories (1350)

scrapy-mcp-server
scrapy-mcp-server scrapoxy

MCP server that enables self-healing automatic repair of Scrapy spiders. When websites change, your scrapers fix themselves.

17
CSCI572-Information_Retrieval_And_Web_Search_Engines
CSCI572-Information_Retrieval_And_Web_Search_Engines Keerthivasan13 Java

Search Engine projects

17
pyReptile
pyReptile xyjw Python

web crawling & scraping framework for Python

16
node-vgmusic-downloader
node-vgmusic-downloader gogson JavaScript

Node.js tool for downloading all free MIDI files on VGMusic.com

16
free-llmstxt-generator
free-llmstxt-generator moinulmoin TypeScript

converts webpage content into Markdown format, optimized for LLM training and context

16
browser-agent
browser-agent lightfeed TypeScript

Replayable Browser Agent

16
WebSearch
WebSearch iTeam-S Python

Python module allowing you to do various searches for links on the Web.

16
crawlly
crawlly gaurav-gogia Go

A simple web crawller in go

15
kasthack.osp
kasthack.osp kasthack-labs C#

Генератор сырых дампов пользователей VK.

15
facebook-scraper-for-non-english-user
facebook-scraper-for-non-english-user dizwe Python

crawling facebok page

15
pumba
pumba sultaniman Elixir

Fetch, store and access user agent strings for different browsers

15
twitter-account-data-crawler
twitter-account-data-crawler somniLegacy Python

Crawl and track followers count of Twitter account

15
re-employment-kraken
re-employment-kraken uschtwill JavaScript

re-employment-kraken scrapes (job) sites, remembers what it saw and notifies downstream systems of any new sightings.

15
darklight
darklight bunseokbot Python

Engine for collecting onion domains and crawling from webpage based on Tor network

15
velog-dashboard
velog-dashboard Check-Data-Out JavaScript

2023.11) velog statistics dashboard fullstack

15
img-cli
img-cli selmi-karim JavaScript

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

14
nutch-solr-integration
nutch-solr-integration basraven

An ultra small PoC to show how to combine Apache Nutch and Apache Solr, crawling through web pages and storing the results in Solr for quering

14
omkar-temp-mail
omkar-temp-mail omkarcloud Python

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

14
React-YouTube-Comment-Section-Scraper
React-YouTube-Comment-Section-Scraper MikeM711 JavaScript

A full stack application that scrapes & filters YouTube comments using Google's Puppeteer, instead of using the YouTube API

13
openclaw-ultra-scraping
openclaw-ultra-scraping LeoYeAI Python

🕷️ Adaptive web scraping skill for OpenClaw agents — bypasses anti-bot, survives site redesigns. Powered by MyClaw.ai

13
firecrawler
firecrawler sammcj JavaScript

A lightweight frontend for self-hosted Firecrawl instances

13
wikipedia-externallinks-fast-extraction
wikipedia-externallinks-fast-extraction lovasoa Rust

Fast extraction of all external links from wikipedia

13
Arachnida
Arachnida guillim JavaScript

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

13
Octocrawl
Octocrawl b3rt1ng Python

Fast, parallel and easy to use web crawler for penetration testing and bug bounty

13
FinalProject-Datascience
FinalProject-Datascience HungTrinhIT Jupyter Notebook

Đồ án cuối kì môn khoa học dữ liệu ứng dụng. Thu thập data bằng cách parsing HTML và sử dụng các mô hình học máy để giải quyết câu hỏi được đặt ra ba...

13
Web-Crawling-Stock-Data-
Web-Crawling-Stock-Data- Ariannahs Jupyter Notebook

东方财富网股票数据爬取

13
Cross-The-Floor
Cross-The-Floor drkostas HTML

Uses Sankey Diagrams to visualize politicians that have "crossed the floor" from election to election.

13
house-bob
house-bob peterwade153 Python

A django application for scraping properties with scrapy.

13
smd
smd adbenitez Python

Simple Manga Downloader, a tool to search and download manga

12
insta-downloader
insta-downloader amirzenoozi Python

You Can Download Instagram Post With This Script

12
googlescholar-crawler
googlescholar-crawler debanjanmahata

This is a crawler for crawling papers from google scholar (http://scholar.google.com). Credits for this code goes to (https://github.com/ckreibich/sch...

12
TGCrawl
TGCrawl Puzzaks Dart

Telegram channel relations analyzer

12
crawl-tiki-products
crawl-tiki-products chidokun Python

Demonstration for crawling Laptop products on Tiki ecomercial website

12
route-waypoints
route-waypoints hkbus Python

Crawling route waypoints for HK bus routes

12
oda
oda jens-ox TypeScript

Extraction, versioning and machine-readable provisioning of public data.

12
node-raspar
node-raspar kodjunkie JavaScript

🕷️ Easily scrap the web for torrent and media files.

12
SECTOOL
SECTOOL orangmuda Shell

sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)

12
crawler-ts
crawler-ts supergillis TypeScript

Crawler written in TypeScript using ES6 generators.

12
mb-checker
mb-checker juvalen Python

Python scripts, first traverses chrome Bookmark file and second removes stale entries. Includes Jenkinsfile to generate docker images.

12
clever_searcher
clever_searcher Azzedde Python

Intelligent web discovery agent with LLM-powered planning, multi-source search, smart deduplication, and GRPO preference dataset collection. Autonomou...

12
fundus-evaluation
fundus-evaluation dobbersc Python

[ACL 2024] Evaluation of the Fundus News Scraper

12
shopee-crawler
shopee-crawler hoangkimminh JavaScript

Simple scripts for crawling shopee's shop and product information from shopee.vn

12
WeiboStockAnalysisCharts
WeiboStockAnalysisCharts rubinliudongpo HTML

crawling china stock recommendation from Sina Weibo, create pyecharts for data

11
data_camp_wcr
data_camp_wcr Beomi Python

파이썬을 활용한 실전 웹크롤링 CAMP 강의 1-2기 소스코드

11
newscorpus
newscorpus gambolputty Python

Docker🐳 setup for automated news article crawling from German news websites. Written in Python🐍, uses MongoDB

11
Quotes-Crawling
Quotes-Crawling ParhamPishro Jupyter Notebook

Crawl Anne Shirley's Quotes from Web | استخراج نقل قول های آن شرلی از وب

11
dropship-trend-crawler
dropship-trend-crawler nabz0r JavaScript

A sophisticated data-driven system that revolutionizes product discovery for dropshipping businesses. Unlike traditional web crawlers, this platform l...

11
ResearchGateCrawler
ResearchGateCrawler SMSadegh19 Python

Python script for crawling ResearchGate.net papers.✨⭐️📎

11
tarantula
tarantula rly0nheart Python

Python web crawler tool

11
Scraping-IMDB
Scraping-IMDB RaedAddala Jupyter Notebook

This Python script extracts comprehensive movie data from IMDB, focusing on top-grossing movies from 1920 to 2025. The scraper collects detailed infor...

11