Most popular crawler repositories and open source projects

Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python

113   338   338  

scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

89   337   337  

media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tu...

47   332   332  

Rcrawler

An R web crawler and scraper

97   331   331  

second-order

Second-order subdomain takeover scanner

65   328   328  

Free_Proxy_Website

获取免费socks/https/http代理的网站集合

76   316   316  

ppspider

web spider built by puppeteer, support task-queue and task-scheduling...

74   315   315  

CrawlerTutorial

爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例

101   312   312  

spidy

The simple, easy to use command line web crawler.

66   311   311  

polite

Be nice on the web

12   308   308  

gopa

GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://inde...

82   296   296  

pychromeless

Python Lambda Chrome Automation (naming pending)

124   294   294  

Sasila

一个灵活、友好的爬虫框架

76   293   293  

chinese-fund-crawler

中国场外基金数据爬取&汇总分析

159   290   290  

Laravel-Crawler-Detect

A Laravel wrapper for CrawlerDetect - the web crawler detection librar...

29   287   287  

PulsarRPA

Automate webpages at scale, scrape web data completely and accurately...

59   287   287  

line-bot-tutorial

line-bot-tutorial use python flask

148   286   286  

awesome-java-crawler

本仓库收集整理爬虫相关资源,开发语言以Java为主

65   276   276  

oddish

Crawl csgo skin info from `buff.163.com` and steam, then find the most...

74   276   276  

Fast-LianJia-Crawler

直接通过链家 API 抓取数据的极速爬虫,宇宙最快~~ 🚀

100   274   274  

Strong-Web-Crawler

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码...

154   272   272  

Instagram-Bot

An Instagram bot developed using the Selenium Framework

88   270   270  

sitemap-generator-cli

Creates an XML-Sitemap by crawling a given site.

41   268   268  

weiboPicDownloader

免登录下载微博图片 爬虫 Download Weibo Images without Logging-in

55   264   264  

Gorecon

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Re...

50   263   263  

algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia whe...

10   259   259  

weibo_terminator_workflow

Update Version of weibo_terminator, This is Workflow Version aim at Ge...

78   258   258  

bitextor

Bitextor generates translation memories from multilingual websites

44   255   255  

arachnid

Crawl all unique internal links found on a given website, and extract...

64   252   252  

Selenops

A Swift Web Crawler 🕷

18   252   252  

wencai

This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)

108   252   252  

Tumblr_Crawler

This is a Multi-thread crawler for Tumblr.

76   251   251  

FileSensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探...

80   250   250  

chromium_for_spider

dynamic crawler for web vulnerability scanner

47   250   250  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

42   250   250  

Sub

节点爬取,筛选, 支持Clash,base64订阅解析,自动生成可用的ss, ssr, v2ray,...

99   249   249  

spider

The fastest web crawler and indexer

34   246   246  

Github-spider

Github 仓库及用户分析爬虫

88   241   241  

ComicCrawler

An image crawler written in Python.

46   238   238  

4chan-downloader

Python3 script to continuously download all images/webms of multiple 4...

32   237   237  

wscan

一款开源的安全评估工具支持常见的 web 安全问题扫描和自定义 POC。此外,...

27   236   236  

woid

Simple news aggregator displaying top stories in real time

122   235   235  

RuiJi.Net

crawler framework, distributed crawler extractor

45   234   234  

ok_ip_proxy_pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

70   233   233  

QQMusicSpider

基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论...

61   229   229  

lightnovel_epub

🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻...

20   228   228  

js-reverse

JS逆向研究

83   227   227  

EmailFinder

Search emails from a domain through search engines

53   226   226  

weibo-topic-spider

微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据

60   224   224  

goose-parser

Universal scraping tool, which allows you to extract data using multip...

16   223   223