Most popular crawler repositories and open source projects

go_jobs

带你了解一下Golang的市场行情

123   612   612  

newcrawler

Free Web Scraping Tool with Java

115   582   582  

google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-s...

155   582   582  

pywebcopy

Locally saves webpages to your hard disk with images, css, js & links...

113   575   575  

runoob-PDF-

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

365   569   569  

scrapedin

LinkedIn Scraper (currently working 2020)

173   566   566  

jvppeteer

Headless Chrome For Java (Java 爬虫)

135   557   557  

XHS-Spider

小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具...

68   556   556  

learnPython

Python的基础练习代码与各种爬虫代码

302   549   549  

FictionDown

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|...

110   548   548  

hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You

74   522   522  

nintendo-switch-eshop

Crawler for Nintendo Switch eShop

83   511   511  

Scan-T

a new crawler based on python with more function including Network fin...

233   510   510  

TumblThree

A Tumblr and Twitter Blog Backup Application

61   509   509  

scrapple

A framework for creating semi-automatic web content extractors

41   501   501  

crawljax

Crawljax

227   493   493  

opensearchserver

Open-source Enterprise Grade Search Engine Software

194   488   488  

python-fxxk-spider

收集各种免费的 Python 爬虫项目

123   481   481  

Html2Article

Html网页正文提取

181   476   476  

vault

swiss army knife for hackers

95   471   471  

python-automation-scripts

Simple yet powerful automation stuffs.

158   466   466  

mmjpg

👩 美女写真套图爬虫(一)

246   462   462  

freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawl...

144   457   457  

webster

a reliable high-level web crawling & scraping framework for Node.js.

57   457   457  

ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials o...

42   453   453  

dude

dude uncomplicated data extraction: A simple framework for writing web...

19   429   429  

signature_algorithm

各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳...

73   418   418  

SpiderSuite

Advance web spider/crawler for cyber security professionals

48   414   414  

Youtube-Projects

This repository contains all the code I use in my YouTube tutorials.

236   410   410  

music-recover

:musical_note: 缓存文件转换为 MP3 文件

121   406   406  

jivesearch

A search engine that doesn't track you.

53   402   402  

Python3Webcrawler

🌈Python3网络爬虫实战:QQ音乐歌曲、京东商品信息、房天下、破解有道翻译...

103   402   402  

tsrtc

台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler

143   400   400  

videodl

Videodl: A lightweight video downloader written by pure python.

132   392   392  

ICLR2019-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials o...

36   389   389  

TTBot

今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟...

145   377   377  

CrawlerForReader

Android 本地网络小说爬虫,基于jsoup及xpath

136   374   374  

webpalm

🕸️ Crawl in the web network

39   372   372  

lxBook

《爬虫逆向进阶实战》书籍代码库

114   370   370  

InstagramCrawler

A non API python program to crawl public photos, posts or followers

110   368   368  

weixin-spider

微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化we...

90   368   368  

scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

89   364   364  

gospider

golang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于col...

104   363   363  

sitemap-generator

Easily create XML sitemaps for your website.

129   362   362  

seo-audits-toolkit

SEO & Security Audit for Websites. Lighthouse & Security Headers crawl...

79   356   356  

zhihu-login

知乎模拟登录,支持提取验证码和保存 Cookies

140   355   355  

Rcrawler

An R web crawler and scraper

92   353   353  

ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

90   352   352  

supercrawler

A web crawler. Supercrawler automatically crawls websites. Define cust...

66   351   351  

JSSoup

JavaScript + BeautifulSoup = JSSoup

37   349   349