Most popular crawler repositories and open source projects

ir guilhermecgs Python

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR,...

174 49 174

spoon Jiramew Python

🥄 A package for building specific Proxy Pool for different Sites.

173 23 173

HotNewsAnalysis Jacen789 Python

利用文本挖掘技术进行新闻热点关注问题分析

172 49 172

mm131 qwertyuiop6 Python

MM131网站图片爬取 :rotating_light:

171 51 171

smarter-encryption duckduckgo Perl

171 40 171

urlbuster cytopia Python

Powerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.

169 29 169

crawler-china-mainland-universities codeudan JavaScript

中国大陆大学列表爬虫

167 48 167

font_obfuscator solarhell Rust

字体混淆服务

167 19 167

courlan adbar Python

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

167 13 167

dl_coursera FLZ101 Python

A simple, fast and reliable Coursera crawling & downloading tool

166 34 166

DouyuBarrage-Pro Crawler995 TypeScript

(2020年最新)斗鱼弹幕抓取及可视化管理平台第二版，提供弹幕抓取、弹幕实时发送速度可视化、抓取记录查询、弹幕下载、自定义关键词统计、铁粉统计、高光时刻自动...

166 27 166

XVideos-PornHub-RedTube-API Joel2B PHP

This script scrapes the HTML from different web pages to get the information from the video (XVideos, PornHub, RedTube) and you can use it in your own...

166 42 166

WeiboCrawler XWang20 Python

无cookie版微博爬虫，可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。

166 24 166

NLP-Twitter h4m5t Python

推特爬虫

165 20 165

python-dcdownloader dev-techmoe Python

由Python编写的全异步实现的动漫之家(dmzj)漫画批量下载器（爬虫）

164 18 164

ScriptSpider xjtushilei Java

A Java componentized distributed crawler framework. 一个Java版本的组件化的分布式通用爬虫

164 75 164

CrawlBox abaykan Python

Easy way to brute-force web directory.

163 42 163

soksaccounts chenjiandongx Python

🔥 Shadowsocks 账号爬虫

162 51 162

fun_crawler ZhangBohan Python

Crawl some picture for fun

161 126 161

yispider 2young2simple Go

一款分布式爬虫平台，帮助你更好的管理和开发爬虫。内置一套爬虫定义规则（模版），可使用模版快速定义爬虫，也可当作框架手动开发爬虫。(兴趣使然的项目，用的...

161 27 161

HttpCode.Core stulzq C#

简单、易用、高效一个有态度的开源.Net Http请求框架!可以用制作爬虫，api请求等等。

161 61 161

douban-movie go-crawler Go

Golang爬虫爬取豆瓣电影Top250

161 65 161

starred yutao8 HTML

github 热门项目个人收藏（1.8k +），包含开发框架、组件、SDK、模板、API接口、IPTV，脚本，爬虫，网盘直链，开源软件，工具等各种项目。

160 21 160

tracker-radar-collector duckduckgo JavaScript

🕸 Modular, multithreaded, puppeteer-based crawler

160 65 160

spidy twiny Go

Domain names collector - Crawl websites and collect domain names along with their availability status.

160 27 160

acm-statistics Liu233w C#

Note: The website is rewritten in https://github.com/Liu233w/ojhunt-lite

159 16 159

spider Winniekun Python

:star2::octocat: powered by python3( simple learning of spider) 百度文库；网易云歌曲；豆瓣电影； GitHub；京东； QQ空间；天气； vip解析助手； TED文...

158 68 158

KTSpeechCrawler EgorLakomkin Python

Automatically constructing corpus for automatic speech recognition from YouTube videos

157 38 157

collector thenurhabib Python

Collect XSS vulnerable parameters from entire domain.

155 36 155

s3recon clarketm Python

Amazon S3 bucket finder and crawler.

155 57 155

tir pouriya Python

Have time.ir in shell!

154 8 154

ngMeta vinaygopinath JavaScript

Dynamic meta tags in your AngularJS single page application

152 43 152

ComputerStudent sfvsfv HTML

计算机专业系统性学习资料（python,c,c++,计算机组成，计算机网络，编译原理，电路，谷歌插件，爬虫）

151 55 151

crawler trandoshan-io Go

Go process used to crawl websites

150 21 150

go-crawler lizongying Go

A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middl...

150 21 150

IpProxyPool wuchunfu Go

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra

150 37 150

not-your-average-web-crawler tijme Python

A web crawler (for bug hunting) that gathers more than you can imagine.

149 36 149

cewler roys Python

CeWLeR - Custom Word List generator Redefined. CeWL alternative in Python, based on the Scrapy framework.

149 17 149

Zhihu-Spider moranzcw Python

一个获取知乎用户主页信息的多线程Python爬虫程序。

148 50 148

jlitespider luohaha Java

A lite distributed Java spider framework :-)

147 37 147

pachong jin10086 Jupyter Notebook

一些爬虫的代码

147 95 147

pylinkvalidator bartdag Python

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) enc...

147 36 147