Topic

crawler

Repositories (1232)

Beanbun
Beanbun kiddyuchina PHP

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。

1.2k
bilili
bilili yutto-dev Python

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器

1.2k
tumblr-crawler
tumblr-crawler dixudx Python

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片,视频

1.1k
fess
fess codelibs Java

Fess is very powerful and easily deployable Enterprise Search Server.

1.1k
instagram-profilecrawl
instagram-profilecrawl InstaPy Python

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

1k
crawly
crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1k
grab-site
grab-site ArchiveTeam Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1k
sqliv
sqliv the-robot Python

massive SQL injection vulnerability scanner

1k
mzitu
mzitu chenjiandongx Python

👧 美女写真套图爬虫(二)

1k
Bili23-Downloader
Bili23-Downloader ScottSloan Python

跨平台的 B 站视频下载工具,支持 Windows、Linux、macOS 三平台,下载 B 站视频/番剧/电影/纪录片等资源。

976
kimuraframework
kimuraframework vifreefly Ruby

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests a...

971
ast-hook-for-js-RE
ast-hook-for-js-RE JSREI JavaScript

浏览器内存漫游解决方案(探索中...)

967
Pxer
Pxer FoXZilla JavaScript

A tool for pixiv.net. 人人可用的P站爬虫

959
crawler
crawler fredwu Elixir

A high performance web crawler / scraper in Elixir.

948
SpiderSuite
SpiderSuite spidersuite

SpiderSuite releases, wiki and roadmap

946
stormcrawler
stormcrawler apache Java

A scalable, mature and versatile web crawler based on Apache Storm

931
SecCrawler
SecCrawler Le0nsec Go

一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄...

923
zhihu-crawler
zhihu-crawler wycm Java

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

919
BaiduImageSpider
BaiduImageSpider kong36088 Python

一个超级轻量的百度图片爬虫

913
TumblThree
TumblThree johanneszab C#

A Tumblr Blog Backup Application

911
chatWeb
chatWeb SkywalkerDarren Python

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key...

910
magnet-dht
magnet-dht chenjiandongx Python

✌️ Python3 BitTorrent DHT crawler

907
scrapyrt
scrapyrt scrapinghub Python

HTTP API for Scrapy spiders

864
XSRFProbe
XSRFProbe 0xInfection Python

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

862
skrape.it
skrape.it skrapeit Kotlin

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places...

852
spidr
spidr postmodern Ruby

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to...

815
till
till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

814
course-crawler
course-crawler Foair Python

🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。

809
Lulu
Lulu iawia002 Python

[Unmaintained] A simple and clean video/music/image downloader 👾

807
easy-scraping-tutorial
easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

802
pic-gather
pic-gather Licoy

🛑 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.

801
sperm
sperm darbra

浏览过的精彩逆向文章汇总,值得一看

791
creeper
creeper wspl Go

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

780
fetchbot
fetchbot PuerkitoBio Go

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

774
BaiduSpider
BaiduSpider BaiduSpider Python

BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百...

765
icrawler
icrawler hellock Python

A multi-thread crawler framework with many builtin image crawlers provided.

759
xxl-crawler
xxl-crawler xuxueli Java

A lightweight web crawler framework.(Java爬虫框架)

740
seo-audits-toolkit
seo-audits-toolkit StanGirard Python

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

737
PyPtt
PyPtt PyPtt Python

The best PTT library

722
TumblThree
TumblThree TumblThreeApp C#

A Tumblr and Twitter Blog Backup Application

715
bookcorpus
bookcorpus soskek Python

Crawl BookCorpus

694
xeHentai
xeHentai fffonion Python

Doujinshi downloader 绅士漫画下载

692
linkedin-profile-scraper-api
linkedin-profile-scraper-api josephlimtech TypeScript

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

680
ArrowDL
ArrowDL setvisible C++

ArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux

670
FileMasta
FileMasta ohhsodead C#

A search application to explore, discover and share online files

668
crawler
crawler kgspider JavaScript

K 哥爬虫代码分享,JS 逆向,爬虫进阶。关注公众号:K哥爬虫

662
gOSINT
gOSINT Nhoya Go

OSINT Swiss Army Knife

649
Scavenger
Scavenger rndinfosecguy Python

Crawler (Bot) searching for credential leaks on paste sites.

649
NetDiscovery
NetDiscovery fengzhizi715 Java

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

647
spider_collection
spider_collection srx-2000 Python

python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩...

647