Topic

crawler

Repositories (1232)

learnPython
learnPython rieuse Python

Python的基础练习代码与各种爬虫代码

644
Weibo-Analyst
Weibo-Analyst KimMeen Python

Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情...

640
runoob-PDF-
runoob-PDF- gagayuan Python

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

635
fbcrawl
fbcrawl rugantio Python

A Facebook crawler

624
DouYin
DouYin Python3WebSpider Python

API of DouYin for Humans used to Crawl Popular Videos and Musics

621
dotcommon
dotcommon Kharacternyk Python

What do people have in their dotfiles?

620
go_jobs
go_jobs go-crawler Go

带你了解一下Golang的市场行情

612
newcrawler
newcrawler speed JavaScript

Free Web Scraping Tool with Java

582
google-play-scraper
google-play-scraper JoMingyu Python

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

582
pywebcopy
pywebcopy rajatomar788 Python

Locally saves webpages to your hard disk with images, css, js & links as is.

575
scrapedin
scrapedin linkedtales JavaScript

LinkedIn Scraper (currently working 2020)

566
jvppeteer
jvppeteer fanyong920 Java

Headless Chrome For Java (Java 爬虫)

557
XHS-Spider
XHS-Spider xisuo67 C#

小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)

556
FictionDown
FictionDown ma6254 Go

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对

548
webster
webster zhuyingda JavaScript

a reliable high-level web crawling & scraping framework for Node.js.

540
vault
vault abhisharma404 Python

swiss army knife for hackers

533
Python3Webcrawler
Python3Webcrawler mochazi Python

🌈Python3网络爬虫实战:QQ音乐歌曲、京东商品信息、房天下、破解有道翻译、构建代理池、豆瓣读书、百度图片、破解网易登录、B站模拟扫码登录、小鹅通、荔枝微课

531
nintendo-switch-eshop
nintendo-switch-eshop lmmfranco TypeScript

Crawler for Nintendo Switch eShop

527
crawljax
crawljax crawljax Java

Crawljax

526
hacker-news-digest
hacker-news-digest polyrabbit Python

:newspaper: Let ChatGPT Summarize Hacker News for You

522
Scan-T
Scan-T nanshihui C

a new crawler based on python with more function including Network fingerprint search

510
scrapple
scrapple AlexMathew Python

A framework for creating semi-automatic web content extractors

502
opensearchserver
opensearchserver jaeksoft Java

Open-source Enterprise Grade Search Engine Software

488
python-fxxk-spider
python-fxxk-spider ityard

收集各种免费的 Python 爬虫项目

481
Html2Article
Html2Article stanzhai C#

Html网页正文提取

476
python-automation-scripts
python-automation-scripts avidLearnerInProgress Python

Simple yet powerful automation stuffs.

466
mmjpg
mmjpg chenjiandongx Python

👩 美女写真套图爬虫(一)

462
ICLR2020-OpenReviewData
ICLR2020-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

461
freshonions-torscraper
freshonions-torscraper dirtyfilthy Python

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

457
Youtube-Projects
Youtube-Projects ayushi7rawat Python

This repository contains all the code I use in my YouTube tutorials.

433
Pinkerton
Pinkerton 0xdsm Python

🕵️ Python project to crawl for JavaScript files and search for secrets like API keys, authorization tokens, hardcoded credentials, etc.

429
dude
dude roniemartinez Python

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

428
signature_algorithm
signature_algorithm gadfly0x Python

各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

418
media-scraper
media-scraper elvisyjlin Python

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

411
seonaut
seonaut StJudeWasHere Go

Open source SEO audit tool.

407
music-recover
music-recover heqin-zhu Python

:musical_note: 缓存文件转换为 MP3 文件

406
second-order
second-order mhmdiaa Go

Second-order subdomain takeover scanner

405
jivesearch
jivesearch jivesearch JavaScript

A search engine that doesn't track you.

402
tsrtc
tsrtc Asoul JavaScript

台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler

400
videodl
videodl CharlesPikachu Python

Videodl: A lightweight video downloader written by pure python.

392
ghcrawler
ghcrawler microsoft JavaScript

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

390
ICLR2019-OpenReviewData
ICLR2019-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

387
TTBot
TTBot 01ly Python

今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现

377
CrawlerForReader
CrawlerForReader smuyyh Java

Android 本地网络小说爬虫,基于jsoup及xpath

374
JSSoup
JSSoup chishui JavaScript

JavaScript + BeautifulSoup = JSSoup

372
webpalm
webpalm XORbit01 Go

🕸️ Crawl in the web network

371
lxBook
lxBook lixi5338619 JavaScript

《爬虫逆向进阶实战》书籍代码库

370
InstagramCrawler
InstagramCrawler tzuhsial Python

A non API python program to crawl public photos, posts or followers

368
weixin-spider
weixin-spider xzkzdx Python

微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等...

368
scrapy-zyte-smartproxy
scrapy-zyte-smartproxy scrapy-plugins Python

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

364