Most popular crawler repositories and open source projects

MCPDocSearch alizdavoodi Python

This project provides a toolset to crawl websites wikis, tool/library documentions and generate Markdown documentation, and make that documentation se...

38 18 38

auto_crawler_ptt_beauty_image twtrubiks Python

Auto Crawler Ptt Beauty Image Use Python Schedule

37 18 37

NodeSpider Bin-Huang TypeScript

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

37 4 37

toxcrawler JFreegman C

A Tox DHT network crawler

37 12 37

2020-nCov-anhui liuhuanshuo Python

2020新型冠状病毒疫情数据爬取、可视化、网站开发部署

37 15 37

Facebooker gpwork4u Python

an unofficial facebook api

37 9 37

Python fmw666 Python

🍋 Python基础、Pygame游戏编程、Python算法与面试题、四种常用的Python Web框架、爬虫、数据可视化、机器学习。一共七个Python大方向！

37 6 37

selenium_facebook_scraper Mhmd-Hisham Python

A simple python3 script used to download a users's friend list from facebook.

37 21 37

crawlhtmltopdf osdodo Python

一个将runoob.com转换为PDF的爬虫

37 11 37

xray_pool allanpk716 Go

基于 Xray-core、glider 的代理池工具

37 6 37

instagram-data-scraper Z786ZA

Instagram Data Scraper analyze profile

37 0 37

WebRecon flashnuke Python

A collection of pentesting web scanners

37 2 37

sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37 0 37

xhs-js saifeiLee JavaScript

基于小红书web端的请求封装,JS实现

37 3 37

NetEaseCloudMusicCrawler timelessmemory Java

HttpClient + Jsoup + Queue

36 14 36

gargantua andreaskoch Go

The fast website crawler

36 3 36

imooc-crawler monkeym4ster JavaScript

[Obsolete] imooc web crawler in Node.js（使用 Node.js 编写的慕课网爬虫）

36 15 36

golearn hackfengJam Go

🔥 Golang basics and actual-combat (including: crawler, distributed-systems, data-analysis, redis, etcd, raft, crontab-task)

36 11 36

vw-crawler vector4wang Java

:beetle:简单轻便的Java爬虫框架，只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。

36 18 36

shadow_spider gzm1997 Python

36 13 36

InstaBot drbuche Python

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

36 11 36

node-html-crawler safonovpro JavaScript

Simple for use node html crawler (spider) of site web pages

36 9 36

fess-crawler codelibs Java

Web/FileSystem Crawler Library

36 16 36

igxe-c5-buff-csgo-skins-sale-data-catch wolverinn Python

Automatically get the csgo skins sale data on igxe.cn and buff and c5game.com.You can choose the specific skins to get data.

36 2 36

PyperGrabber pykong Python

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

36 8 36

crazyDhtSpider ixiaofeng PHP

Based on Swoole,a PHP DHT crawler, which have insane productivity（依托于swoole的PHP版本的DHT爬虫，有着奇高的效率）

36 22 36

medup miry Crystal

Download all content from Medium and Dev.to to local folder

36 9 36

Web-Crawler kshru9 C++

A multithreaded web crawler using two mechanism - single lock and thread safe data structures

36 11 36

filecrawler helviojunior Python

File Crawler index files and search hard-coded credentials

36 10 36

proxy-in-a-box naiba Go

Automatic proxy pool for web scraping - crawls, validates, and rotates proxies with rate limiting and MITM support

36 3 36

Python-scraper-tutorial Decodo Python

A short introduction to scraping with Python with given steps and an example scraper script.

36 8 36

PixivCrawlerIII Neod0Matrix Python

A python3 crawler for crawling Pixiv ranking top and any illustrator all artworks

35 9 35

MMDownloader occidere Java

마루마루 다운로더 신규 프로젝트

35 9 35

TaobaoAnalysis xfgryujk Python

练习NLP，分析淘宝评论的项目

35 6 35

Youtube_Comment_Crawler SOMJANG Jupyter Notebook

유튜브 댓글 크롤러 ( Python, BeautifulSoup, Selenium )

35 12 35

Mini-Projects nazaninsbr Python

A collection of short projects, you could try and implement these as short projects or use them as part of a larger project.

35 14 35

emarketcrawlR wagnertimo R

This R package provides a crawler to scrape the European Energy Market EPEX SPOT at https://www.epexspot.com and the European Energy Exchange at https...

35 10 35