Topic

crawler

Repositories (1431)

ir
ir guilhermecgs Python

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR,...

174
spoon
spoon Jiramew Python

🥄 A package for building specific Proxy Pool for different Sites.

173
HotNewsAnalysis
HotNewsAnalysis Jacen789 Python

利用文本挖掘技术进行新闻热点关注问题分析

172
mm131
mm131 qwertyuiop6 Python

MM131网站图片爬取 :rotating_light:

171
smarter-encryption
smarter-encryption duckduckgo Perl
171
urlbuster
urlbuster cytopia Python

Powerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.

169
crawler-china-mainland-universities
crawler-china-mainland-universities codeudan JavaScript

中国大陆大学列表爬虫

167
font_obfuscator
font_obfuscator solarhell Rust

字体混淆服务

167
courlan
courlan adbar Python

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

167
dl_coursera
dl_coursera FLZ101 Python

A simple, fast and reliable Coursera crawling & downloading tool

166
DouyuBarrage-Pro
DouyuBarrage-Pro Crawler995 TypeScript

(2020年最新)斗鱼弹幕抓取及可视化管理平台第二版,提供弹幕抓取、弹幕实时发送速度可视化、抓取记录查询、弹幕下载、自定义关键词统计、铁粉统计、高光时刻自动...

166
XVideos-PornHub-RedTube-API
XVideos-PornHub-RedTube-API Joel2B PHP

This script scrapes the HTML from different web pages to get the information from the video (XVideos, PornHub, RedTube) and you can use it in your own...

166
WeiboCrawler
WeiboCrawler XWang20 Python

无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。

166
NLP-Twitter
NLP-Twitter h4m5t Python

推特爬虫

165
python-dcdownloader
python-dcdownloader dev-techmoe Python

由Python编写的全异步实现的动漫之家(dmzj)漫画批量下载器(爬虫)

164
ScriptSpider
ScriptSpider xjtushilei Java

A Java componentized distributed crawler framework. 一个Java版本的组件化的分布式通用爬虫

164
CrawlBox
CrawlBox abaykan Python

Easy way to brute-force web directory.

163
soksaccounts
soksaccounts chenjiandongx Python

🔥 Shadowsocks 账号爬虫

162
fun_crawler
fun_crawler ZhangBohan Python

Crawl some picture for fun

161
yispider
yispider 2young2simple Go

一款分布式爬虫平台,帮助你更好的管理和开发爬虫。 内置一套爬虫定义规则(模版),可使用模版快速定义爬虫,也可当作框架手动开发爬虫。(兴趣使然的项目,用的...

161
HttpCode.Core
HttpCode.Core stulzq C#

简单、易用、高效 一个有态度的开源.Net Http请求框架!可以用制作爬虫,api请求等等。

161
douban-movie
douban-movie go-crawler Go

Golang爬虫 爬取豆瓣电影Top250

161
starred
starred yutao8 HTML

github 热门项目个人收藏 (1.8k +),包含开发框架、组件、SDK、模板、API接口、IPTV,脚本,爬虫,网盘直链,开源软件,工具等各种项目。

160
tracker-radar-collector
tracker-radar-collector duckduckgo JavaScript

🕸 Modular, multithreaded, puppeteer-based crawler

160
spidy
spidy twiny Go

Domain names collector - Crawl websites and collect domain names along with their availability status.

160
acm-statistics
acm-statistics Liu233w C#

Note: The website is rewritten in https://github.com/Liu233w/ojhunt-lite

159
spider
spider Winniekun Python

:star2::octocat: powered by python3( simple learning of spider) 百度文库;网易云歌曲; 豆瓣电影; GitHub; 京东; QQ空间; 天气; vip解析助手; TED文...

158
KTSpeechCrawler
KTSpeechCrawler EgorLakomkin Python

Automatically constructing corpus for automatic speech recognition from YouTube videos

157
collector
collector thenurhabib Python

Collect XSS vulnerable parameters from entire domain.

155
s3recon
s3recon clarketm Python

Amazon S3 bucket finder and crawler.

155
tir
tir pouriya Python

Have time.ir in shell!

154
ngMeta
ngMeta vinaygopinath JavaScript

Dynamic meta tags in your AngularJS single page application

152
ComputerStudent
ComputerStudent sfvsfv HTML

计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)

151
crawler
crawler trandoshan-io Go

Go process used to crawl websites

150
go-crawler
go-crawler lizongying Go

A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middl...

150
IpProxyPool
IpProxyPool wuchunfu Go

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra

150
not-your-average-web-crawler
not-your-average-web-crawler tijme Python

A web crawler (for bug hunting) that gathers more than you can imagine.

149
cewler
cewler roys Python

CeWLeR - Custom Word List generator Redefined. CeWL alternative in Python, based on the Scrapy framework.

149
Zhihu-Spider
Zhihu-Spider moranzcw Python

一个获取知乎用户主页信息的多线程Python爬虫程序。

148
jlitespider
jlitespider luohaha Java

A lite distributed Java spider framework :-)

147
pachong
pachong jin10086 Jupyter Notebook

一些爬虫的代码

147
pylinkvalidator
pylinkvalidator bartdag Python

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) enc...

147
sasori
sasori karthikuj JavaScript

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

146
mcp-read-website-fast
mcp-read-website-fast just-every TypeScript

Quickly reads webpages and converts to markdown for fast, token efficient web scraping

145
GoodreadsScraper
GoodreadsScraper havanagrawal Python

Scrape data from Goodreads using Scrapy and Selenium :books:

145
crawler_detect
crawler_detect loadkpi Ruby

Ruby gem to detect bots and crawlers via the user agent

144
bilibili_member_crawler
bilibili_member_crawler cwjokaka Python

B站用户爬虫 好耶~是爬虫

143
taki
taki egoist TypeScript

Take a snapshot of any website.

143
proxifier
proxifier rookmoot Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.

143
SpiderBOX
SpiderBOX WuKongSecurity CSS

SpiderBox - 虫盒 - 爬虫逆向资源导航站

142