Most popular crawler repositories and open source projects

Awesome-Scrapy

一个基于Scrapy的数据采集爬虫代码库

16   47   47  

xSMTP

xSMTP 🦟 Lightning fast, multithreaded smtp scanner targeting open-rel...

21   47   47  

webrtc-local-ip-leak

Oh no, stop this. You can see my local IP address 😲! Use `foundation`...

5   46   46  

codes-scratch-crawler

读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本...

21   46   46  

scrapy-admin

A django admin site for scrapy

12   46   46  

maman

Rust Web Crawler saving pages on Redis

6   46   46  

crawler

Chromium / Puppeteer site crawler

5   46   46  

gscholar-citations-crawler

Crawl all your citations from Google Scholar

11   46   46  

scrapy-kafka-redis

Distributed crawling/scraping, Kafka And Redis based components for S...

13   45   45  

jason-the-miner

⛏ A versatile Web scraper for Node.js

11   45   45  

bthello

Python3 DHT 磁力种子爬虫 种子解析 种子搜索 演示地址

26   45   45  

local-api-client-typescript

Official JavaScript/TypeScript library for interacting with Kameleo Cl...

3   45   45  

Web-crawler-engineer-for-Python

Web-crawler-engineer-for-Python

23   44   44  

crawler-jsoup-maven

This is a crawler(reptile)

24   44   44  

dbworld-search

:mag: 简单的搜索引擎, django 框架

18   44   44  

seenreq

Generate an object for testing if a request is sent, request is Mikeal...

9   44   44  

shopify-app-store-scraper

Crawler behind the Shopify App Marketplace dataset

12   44   44  

jadwalsholatorg

Parsed data from website https://jadwalsholat.org

19   44   44  

FunUtils

Some codes i wrote to help me with me with my daily errands ;)

51   43   43  

copyheaders

方便的从浏览器复制浏览器头

9   43   43  

anilist-crawler

Crawl data from anilist API and store in MariaDB.

3   43   43  

douyin-crawler

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

11   43   43  

bluebird

Unofficial Python client for Twitter

14   43   43  

gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

7   43   43  

FF14AutoSignIn

FF14 国服官网自动签到脚本

7   43   43  

webmagician-ui

An admin UI project for a configurable web crawler platform

14   42   42  

Broken-Link-Crawler

:robot: Python bot that crawls your website looking for dead stuff

13   42   42  

nhentai-imgcollect

:rocket: 使用PyQt5图形界面的Python多线程nhentai爬虫

10   42   42  

scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distri...

9   42   42  

steam-discount

steam 特惠游戏榜单(自动刷新)

15   42   42  

aio-vextractor

解析视频 网站/APP/H5 页面视频信息。支持抖音、腾讯视频、YouTube、Instag...

11   41   41  

ronin-web

ronin-web is a collection of useful web helper methods and commands.

6   41   41  

PSGameSpider

自动爬取所有PlayStationStore中的所有游戏封面,自动生成网页并索引 # # #...

5   41   41  

js-cookie-monitor-debugger-hook

js cookie逆向利器:js cookie变动监控可视化工具 & js cookie hook打条件...

10   41   41  

noscrape

This repository is deprecated

8   41   41  

spider.npm

网络爬虫类库,基本可以实现自定义规则大部分网站

7   41   41  

python-facebook-bot

Get facebook events from location with Python 3

16   41   41  

Crawler

:snake:A collection of simple Python crawlers.

15   41   41  

AntiCloudFlare

对抗cloudflare载入页反爬虫防护(已失效)

21   41   41  

crawler

nodejs 爬虫框架. crawler framework for nodejs

9   41   41  

PyTse

TseTmc Crawler

8   41   41  

wx-crawl

微信公众号文章爬虫

12   41   41  

stock_linebot_public

The project for Linebot

9   41   41  

python-crawler

爬虫学习仓库,适合零基础的人学习,对新手比较友好

14   41   41  

PageParser

网页解析器,用于网络爬虫解析页面, 不懂网页解析也能写爬虫

16   41   41  

aristotle

highly customizable news collector

5   40   40  

php-crawler

:spider: A simple crawler (spider) writen in php just for fun, with ze...

19   40   40  

laundry

Data laundering tools

4   40   40  

USTBCrawlers

那些年,我爬过的北科。一个由浅入深的定向爬虫教程。

7   40   40  

ncrawler

Web Crawler written in C#

14   40   40