Topic

crawler

Repositories (1431)

tsec
tsec Asoul

台灣上市上櫃股票爬蟲 Taiwan Stock Exchange Crawler

375
google-news-scraper
google-news-scraper lewisdonovan TypeScript

Lightweight scraper for Google News

373
JSSoup
JSSoup chishui JavaScript

JavaScript + BeautifulSoup = JSSoup

372
nebula
nebula dennis-tra Go

🌌 An agnostic network crawler exposing comprehensive peer information and network topology information.

370
crawler
crawler crwlrsoft PHP

Library for Rapid (Web) Crawler and Scraper Development

369
weixin-spider
weixin-spider xzkzdx Python

微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等...

368
CrawlerTutorial
CrawlerTutorial leVirve Python

爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例

367
scrapy-zyte-smartproxy
scrapy-zyte-smartproxy scrapy-plugins Python

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

365
news-crawl
news-crawl commoncrawl Java

News crawling with StormCrawler - stores content as WARC

365
javbus-api
javbus-api ovnrain TypeScript

一个自我托管的 JavBus API 服务

364
hQuery.php
hQuery.php duzun PHP

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

363
Rcrawler
Rcrawler salimk R

An R web crawler and scraper

363
QQMusicSpider
QQMusicSpider yangjianxin1 Python

基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

363
copymanga-downloader
copymanga-downloader misaka10843 Python

使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画(无速率限制),支持批量+选话下载和获取您收藏的漫画并下载及半自动获取订阅下载!(全平台支持(pypi...

361
nudecrawler
nudecrawler yaroslaff Python

Crawl telegra.ph searching for nudes!

360
chinese-fund-crawler
chinese-fund-crawler jackluson Python

中国场外基金数据爬取&汇总分析

360
zhihu-login
zhihu-login zkqiang Python

知乎模拟登录,支持提取验证码和保存 Cookies

357
vyntr
vyntr outpoot TypeScript

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com

355
spidy
spidy rivermont Python

The simple, easy to use command line web crawler.

354
telegram-crawler
telegram-crawler MarshalX Python

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

349
91porn-api
91porn-api colikno JavaScript

🌭💦 91porn爬虫在线无限制API接口(永久有效,口令每日更新) 及 在线web预览

344
ppspider
ppspider xiyuan-fengyu TypeScript

web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppetee...

339
xcrawler
xcrawler yan68 PHP

快速、简洁且强大的PHP爬虫框架

338
sitemap-generator-cli
sitemap-generator-cli lgraubner JavaScript

Creates an XML-Sitemap by crawling a given site.

337
lightnovel_epub
lightnovel_epub JeffersonQin Python

🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻之国度、轻小说文库

337
tiktok-downloader
tiktok-downloader krypton-byte Python

Tiktok Downloader/Scraper using requests & bs4

337
crawley
crawley s0rg Go

The unix-way web crawler

337
polite
polite dmi3kno R

Be nice on the web

335
Hydra
Hydra DragonKingpin Java

为超级个体和一个人公司打造一个人的中台,Hydra九头龙构筑大规模AI调度、数据采集、情报系统、数据平台、分析决策、产品生产的'军事'工业基座。

332
awesome-java-crawler
awesome-java-crawler rockswang

本仓库收集整理爬虫相关资源,开发语言以Java为主

325
Laravel-Crawler-Detect
Laravel-Crawler-Detect JayBizzle PHP

A Laravel wrapper for CrawlerDetect - the web crawler detection library

323
wencai
wencai GraySilver JavaScript

This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)

323
oddish
oddish puppylpg Python

Crawl csgo skin info from `buff.163.com` and steam, then find the most suitable one to buy from the former and to sell to the latter.

322
crawler
crawler infinilabs Go

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

314
scrapper
scrapper amerkurev Python

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

314
extractor
extractor lightfeed TypeScript

Use LLMs to robustly extract web data

314
4chan-downloader
4chan-downloader Exceen Python

Python3 script to continuously download all images/videos of multiple 4chan threads simultaneously - without installation

310
Python-Web-Scraping-Tutorial
Python-Web-Scraping-Tutorial oxylabs Python

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move...

303
weibo-topic-spider
weibo-topic-spider czy1999 Python

微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据

302
Fast-LianJia-Crawler
Fast-LianJia-Crawler CaoZ Python

直接通过链家 API 抓取数据的极速爬虫,宇宙最快~~ 🚀

299
bitextor
bitextor bitextor Python

Bitextor generates translation memories from multilingual websites

299
js-reverse
js-reverse freedom-wy HTML

JS逆向研究

299
line-bot-tutorial
line-bot-tutorial twtrubiks Python

line-bot-tutorial use python flask

298
crawler_shopee_public
crawler_shopee_public hsuanchi Python

蝦皮非同步爬蟲 + 競品賣家分析

298
Sasila
Sasila da2vin Python

一个灵活、友好的爬虫框架

297
pychromeless
pychromeless jairovadillo Python

Python Lambda Chrome Automation (naming pending)

294
site-audit-seo
site-audit-seo viasite JavaScript

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx

292
ComicCrawler
ComicCrawler eight04 Python

An image crawler written in Python.

291
PulsarRPA
PulsarRPA platonai Kotlin

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

287
Gorecon
Gorecon devanshbatham Go

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into...

285