Most popular crawler repositories and open source projects

SeleniumDemo

Selenium automation test framework

93   84   84  

Amazon-Price-Alert

Price tracker of Amazon

28   84   84  

pagser

Pagser is a simple, extensible, configurable parse and deserialize htm...

7   84   84  

bathyscaphe

Fast, highly configurable, cloud native dark web crawler.

24   83   83  

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler dev...

14   83   83  

is-google

Verify that a request is from Google crawlers using Google's DNS verif...

7   82   82  

Hands-on-WebScraping

This repo is a part of blog series on several web scraping projects wh...

73   82   82  

weibo-scraper

Simple Weibo Scraper

18   82   82  

Proxy-List-Scrapper

Proxy List Scrapper

18   82   82  

XVideos-PornHub-RedTube-API

This script scrapes the HTML from different web pages to get the infor...

31   81   81  

ceiba-dl

NTU CEIBA 資料下載工具

11   80   80  

random_user_agent

A package to get list of user agents based on filters such as operatin...

12   80   80  

puppeteer-walker

a puppeteer walker 🕷 🕸

11   79   79  

Novel-crawler

这是一个用Python写的小说爬虫软件

27   79   79  

deepweb-scappering

Discover hidden deepweb pages

16   79   79  

tumblr_crawler

tumblr解析网站

43   78   78  

arachnid

Powerful web scraping framework for Crystal

11   78   78  

crawler_examples

Some classic web crawler projects.一些经典的爬虫

31   77   77  

scrapy-examples

Some scrapy and web.py exmaples

32   77   77  

ctrip_spider

Scrape Learning (ctrip)

34   76   76  

tumblr-crawler-cli

Tumblr Download Tool with High Speed and Customization. 高性能&高定制...

15   76   76  

fetchman

fetchman is a simple crawler system/简单好用的爬虫框架

20   76   76  

BUbiNG

The LAW next generation crawler.

24   76   76  

WebSecurityArticles

爬取及整理Freebuf\安全客\先知\知道创宇等站点的”web安全“类优质文章

20   76   76  

light-crawler

a simplified directed customizable website crawler

23   75   75  

tg_crawler

Just a crawler based on tg-cli for Telegram. Deprecated by now, please...

21   75   75  

venom

Your preferred open source focused crawler for the deep web.

5   74   74  

BOJ-AutoCommit

When you solve the problem of Baekjoon Online Judge, it automatically...

12   74   74  

python-tools

A collection of Python tools, scripts and utilities to make your life...

18   74   74  

simpyder

超高速异步协程Python爬虫

23   74   74  

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

9   74   74  

fund-crawler

基于NodeJS的基金数据爬虫,爬取的数据存于github的@nullpointer/fund-data...

42   73   73  

spider

python crawler spider

26   72   72  

python-testing-crawler

A crawler for automated functional testing of a web application

5   72   72  

crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

9   72   72  

lrabbit_scrapy

a quick start python mutil thread crawl

1   72   72  

achoz

Search through all your personal data efficiently like web search.

4   72   72  

COI

练手项目:Comment of Interest 电商文本评论数据挖掘 (爬虫 + 观点抽取 +...

11   71   71  

Instagram-downloader

Instagram user's photos and videos downloader. Download all media file...

18   71   71  

car-prices

Golang爬虫 爬取汽车之家 二手车产品库

37   70   70  

IpProxyPool

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip cr...

26   70   70  

darc

Darkweb Crawler Project

13   69   69  

python-crawler

Python Crawler

52   69   69  

tiktok-scraper-php

Tiktok (Musically) PHP scraper

30   69   69  

robotstxt

robots.txt file parsing and checking for R

8   69   69  

ComicSpider

动漫之家漫画站电脑版原图爬虫

17   68   68  

Wedge

可配置的小说下载及电子书生成工具

21   67   67  

hproxy

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as con...

13   66   66  

newspaperjs

News extraction and scraping. Article Parsing

19   66   66  

Pasta

A PasteBin scrapper that doesnt rely on the PasteBin scrape API

6   66   66