Most popular crawler repositories and open source projects

authority-data

官方权威数据:统计年签,统计公报,互联网行业报告,工信部数据,ICT报告...

9   54   54  

alipay-crawler

支付宝账单爬虫

16   53   53  

SearchEngineScrapy

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com,...

18   53   53  

OpenCrawler

Open Crawler || Open Source Crawler

5   53   53  

findopendata

A search engine for Open Data

6   52   52  

site-mirror-py

[码云](https://gitee.com/generals-space/site-mirror-py) 通用爬虫, 仿站...

18   52   52  

local-api-client-python

Official Python library for interacting with Kameleo Client

3   52   52  

WebTable

A python package that takes tables from a web page and processes them...

2   52   52  

rarbgcli

RARBG command line interface for scraping the rarbg.to torrent search...

10   52   52  

facebook-messenger-bot-tutorial

facebook-messenger-bot-tutorial use Python Django

12   51   51  

fb-page-chat-download

Python script to download messages from a Facebook page to a CSV file

33   51   51  

baidu-chain-dog

百度莱茨狗爬虫。

15   51   51  

GPlayCrawler

13   51   51  

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-comm...

17   51   51  

price-monitoring

Node.js price monitoring library, leveraging the power of x-ray and ni...

7   51   51  

tw-stock-telegram-bot

台股機器人,提供即時個股及大盤報價、走勢、新聞、盤後資料等 Telegram bo...

7   51   51  

crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-ca...

81   51   51  

crawler_shopee

Shopee coin getter is a script to collect daily shopee coins.

16   51   51  

crawler-userscript

一个基于 Tampermonkey 插件平台开发的爬虫。主要目的是最大限度模拟用户环...

18   51   51  

open-gov-crawlers

Parse government documents into well formed JSON

4   51   51  

Timbr_V1

A web service that turns an arbitrary web page into structural JSON da...

1   50   50  

bloodhound

36   50   50  

snapcrawl

Crawl a website and take screenshots

10   50   50  

Crawling-CV-Conference-Papers

Crawling CV conference papers with Python.

6   50   50  

kepub

Crawl novels from sfacg, ciweimao, esjzone, lightnovel and masiro; gen...

14   50   50  

html-query

A fluent and functional approach to querying HTML

11   49   49  

nasty

NASTY Advanced Search Tweet Yielder

9   49   49  

scrapy.dart

Scrapy, a fast high-level web crawling & scraping framework for dart a...

8   49   49  

NLP-Twitter

推特爬虫

8   49   49  

Mini-Spider

简单、实用的爬虫工具,仅需四步创建属于你的爬虫程序!

23   48   48  

rolling-news

获取滚动新闻

13   48   48  

DouYinSDK

抖音 SDK,数据采集,爬虫抓取不是梦

9   48   48  

unfx-proxy-parser

Unfx Proxy Parser - Nextgen proxy parser with deep links crawler. Foll...

16   48   48  

seonaut

Open source SEO auditing tool.

6   48   48  

tors

⏬ Yet another torrent searching application for your command line

5   47   47  

URLBrute-Py

Tool to brute website sub-domains and dirs.

8   47   47  

httpseed

Cartographer: A new type of seed for the Bitcoin network

25   47   47  

wishlist

Read an Amazon wishlist programmatically with Python

13   47   47  

browser-as-a-service

A web browser :earth_americas: hosted as a service, to render your Jav...

11   47   47  

crawler_JD_what_worthy_buying

爬取京东商品所有评论,利用情感分析,判断商品是否值得买

13   47   47  

SearchX

基于规则的跨平台一站式聚合搜索工具

15   47   47  

Awesome-Scrapy

一个基于Scrapy的数据采集爬虫代码库

16   47   47  

xSMTP

xSMTP 🦟 Lightning fast, multithreaded smtp scanner targeting open-rela...

21   47   47  

codes-scratch-crawler

读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本...

21   46   46  

scrapy-admin

A django admin site for scrapy

12   46   46  

maman

Rust Web Crawler saving pages on Redis

6   46   46  

crawler

Chromium / Puppeteer site crawler

5   46   46  

gscholar-citations-crawler

Crawl all your citations from Google Scholar

11   46   46  

webrtc-local-ip-leak

Oh no, stop this. You can see my local IP address 😲! Use `foundation`...

5   46   46  

scrapy-kafka-redis

Distributed crawling/scraping, Kafka And Redis based components for S...

13   45   45