Topic

crawling

Repositories (1230)

Sasila
Sasila da2vin Python

一个灵活、友好的爬虫框架

296
Instagram-Bot
Instagram-Bot mustafadalga Python

An Instagram bot developed using the Selenium Framework

281
scrapper
scrapper amerkurev Python

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

266
antch
antch antchfx Go

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

263
laravel
laravel roach-php PHP

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

259
Infect
Infect mishakorzik Shell

Create you virus in termux!

231
N2H4
N2H4 forkonlp R

네이버 뉴스 수집을 위한 도구

217
Grawler
Grawler A3h1nt PHP

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them...

213
corpuscrawler
corpuscrawler google Python

Crawler for linguistic corpora

205
facebook-data-extraction
facebook-data-extraction 18520339 Python

Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract C...

205
estela
estela bitmakerla TypeScript

estela, an elastic web scraping cluster 🕸

185
SpideyX
SpideyX RevoltSecurities Python

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

178
DotnetCrawler
DotnetCrawler mehmetozkaya C#

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library des...

176
Squidwarc
Squidwarc N0taN3rd JavaScript

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

170
massivedl
massivedl dimkouv Go

Download a large list of files concurrently

161
crawler
crawler trandoshan-io Go

Go process used to crawl websites

150
cdp4j
cdp4j webfolderio Java

cdp4j - Chrome DevTools Protocol for Java

144
sasori
sasori karthikuj JavaScript

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

144
courlan
courlan adbar Python

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

142
double-agent
double-agent unblocked-web TypeScript

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

140
scraply
scraply alash3al Go

Scraply a simple dom scraper to fetch information from any html based website

129
proxifier
proxifier rookmoot Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.

128
wget-lua
wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

125
pdf-crawler
pdf-crawler SimFin Python

SimFin's open source PDF crawler

124
aioscpy
aioscpy ihandmine Python

An asyncio + aiolibs crawler imitate scrapy framework

124
sitemapper
sitemapper seantomburke TypeScript

Parse through any sitemap in Node.js

122
LinkedIn-Skills-Crawler
LinkedIn-Skills-Crawler varadchoudhari Python

A simple Python script to crawl complete list of LinkedIn skills

121
bots-zoo
bots-zoo antoinevastel JavaScript
115
jkcrawler
jkcrawler topiccrawler Python

使用 Scrapy 写成的 JK 爬虫,图片源自哔哩哔哩、Tumblr、Instagram,以及微博、Twitter

114
warc-parquet
warc-parquet maxcountryman Rust

🗄️ A simple CLI for converting WARC to Parquet.

112
burp-dom-scanner
burp-dom-scanner fcavallarin Java

Burp Suite's extension to scan and crawl Single Page Applications

105
dig-etl-engine
dig-etl-engine usc-isi-i2

Download DIG to run on your laptop or server.

103
AyugeSpiderTools
AyugeSpiderTools shengchenyang Python

使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。

98
devdocs-to-llm
devdocs-to-llm alexfazio Jupyter Notebook

Turn any developer documentation into a GPT

98
bathyscaphe
bathyscaphe creekorful Go

Fast, highly configurable, cloud native dark web crawler.

94
ARGUS
ARGUS datawizard1337 Python

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different website...

88
robots.txt
robots.txt jonasjacek

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

86
spidercreator
spidercreator carlosplanchon Python

Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal manual coding. I...

79
abx-dl
abx-dl ArchiveBox JavaScript

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...

78
arachnid
arachnid watzon Crystal

Powerful web scraping framework for Crystal

78
feedsearch-crawler
feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

77
goClone
goClone shurco Go

🌱 goClone - clone websites in seconds

76
tech-seo-crawler
tech-seo-crawler jroakes Python

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

74
Harvester
Harvester TransparencyToolkit JavaScript

Web crawling and document processing through a usable interface.

72
Python-Crawling-Tutorial
Python-Crawling-Tutorial afunTW Jupyter Notebook

Python crawling tutorial

62
datacrawl
datacrawl DataCrawl-AI Python

A simple and easy to use web crawler for Python

62
crawling-projects
crawling-projects guptachetan1997 Python

Web scraping and automation using python

62
custom-crawler
custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60
scrapy-distributed
scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

60
pomp
pomp estin Python

Screen scraping and web crawling framework

59