Most popular crawler repositories and open source projects

uforall rix4uni Go

uforall is a fast url crawler this tool crawl all URLs number of different sources, alienvault,WayBackMachine,urlscan,commoncrawl

54 11 54

flink-crawler kkrugler Java

Continuous scalable web crawler built on top of Flink and crawler-commons

53 18 53

browser-as-a-service hfreire JavaScript

A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML

53 11 53

PageParser mouday Python

网页解析器，用于网络爬虫解析页面, 不懂网页解析也能写爬虫

53 17 53

python-scrapfly scrapfly Python

Scrapfly Python SDK for headless browsers and proxy rotation

53 15 53

fb-page-chat-download eisenjulian Python

Python script to download messages from a Facebook page to a CSV file

52 31 52

MahjongKit erreurt Python

Riichi Mahjong Kit: (1) Game log crawler (sqlite3, json, bs4); (2) Game log preprocessor; (3) Deterministic algorithms library

52 9 52

SearchX LanyuanXiaoyao-Studio Vue

基于规则的跨平台一站式聚合搜索工具

52 15 52

go-crawler-distributed golang-collection Go

分布式爬虫项目，本项目支持个性化定制页面解析器二次开发，项目整体采用微服务架构，通过消息队列实现消息的异步发送，使用到的框架包括：redigo, gorm, goquer...

52 8 52

Deepminer Conso1eCowb0y Python

Deep web crawler and search engine

52 12 52

thecrowler pzaino Go

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze,...

52 11 52

baidu-chain-dog CoolAcsi Java

百度莱茨狗爬虫。

51 15 51

GPlayCrawler KopLyf Python

51 13 51

alipay-crawler he426100 PHP

支付宝账单爬虫

51 16 51

scrapy.dart sachaarbonel Dart

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

51 7 51

TwitterCrawler casolxia Java

抓取twitter数据，可根据时间、话题、用户名等条件抓取数据，twitter爬虫

51 14 51

usetube valerebron TypeScript

search & get datas from youtube no google account needed

51 18 51

tech-stack-datasets leadita

Open datasets of companies & websites grouped by technologies they use (CSV & JSON). Discover who uses Shopify, Stripe, Woocommerce, HubSpot, and more...

51 8 51

facebook-messenger-bot-tutorial twtrubiks Python

facebook-messenger-bot-tutorial use Python Django

50 13 50

Timbr_V1 lvyachao JavaScript

A web service that turns an arbitrary web page into structural JSON data and easy-to-use APIs with just a few clicks

50 1 50

html-query h12w Go

A fluent and functional approach to querying HTML

50 10 50

bloodhound vitorfs Python

50 36 50

nasty lschmelzeisen Python

NASTY Advanced Search Tweet Yielder

50 8 50

Mini-Spider zhangyunhao116 Python

简单、实用的爬虫工具，仅需四步创建属于你的爬虫程序！

50 23 50

python-crawler dateolive Python

爬虫学习仓库，适合零基础的人学习，对新手比较友好

50 14 50

kepub TerakomariGandesblood C++

Crawl novels from sfacg, ciweimao, esjzone, lightnovel and masiro; generate, append and extract epub

50 14 50

armiarma migalabs Go

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network

50 16 50

nextcrawler g089h515r806 JavaScript

Next Crawler 是使用Playwright + Next.js + Prisma等主流技术搭建的网页数据采集器，通过可视化的UI进行配置，即可周期性的通过Playwright驱动浏览器爬取网页数...

50 6 50

x12306 0xHJK Python

12306查票助手，一键查询沿途所有站点，先上车后补票，让你的出行更省心。

50 13 50

URLBrute-Py ReddyyZ Python

Tool to brute website sub-domains and dirs.

49 8 49

fii riquellopes HTML

API para recuperar informações sobre FII

49 11 49

AzureSearchCrawler thomas11 C#

A simple web crawler, using Abot, that indexes page contents into Azure Search.

49 19 49

subscan eredotpkfr Rust

⚡ A subdomain enumeration tool leveraging diverse techniques, designed for advanced pentesting operations

49 2 49

NeedFree InJeCTrL Python

Crawl 100%-discount games on steam

49 5 49

Dream11_Leaderboard mochatek Python

Python script to get the leaderboard along with corresponding team details of the Dream11 contest we are participating in an excel sheet as soon as th...

49 39 49