Topic

crawler

Repositories (1431)

uforall
uforall rix4uni Go

uforall is a fast url crawler this tool crawl all URLs number of different sources, alienvault,WayBackMachine,urlscan,commoncrawl

54
flink-crawler
flink-crawler kkrugler Java

Continuous scalable web crawler built on top of Flink and crawler-commons

53
browser-as-a-service
browser-as-a-service hfreire JavaScript

A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML

53
PageParser
PageParser mouday Python

网页解析器,用于网络爬虫解析页面, 不懂网页解析也能写爬虫

53
python-scrapfly
python-scrapfly scrapfly Python

Scrapfly Python SDK for headless browsers and proxy rotation

53
fb-page-chat-download
fb-page-chat-download eisenjulian Python

Python script to download messages from a Facebook page to a CSV file

52
MahjongKit
MahjongKit erreurt Python

Riichi Mahjong Kit: (1) Game log crawler (sqlite3, json, bs4); (2) Game log preprocessor; (3) Deterministic algorithms library

52
SearchX
SearchX LanyuanXiaoyao-Studio Vue

基于规则的跨平台一站式聚合搜索工具

52
go-crawler-distributed
go-crawler-distributed golang-collection Go

分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息的异步发送,使用到的框架包括:redigo, gorm, goquer...

52
Deepminer
Deepminer Conso1eCowb0y Python

Deep web crawler and search engine

52
thecrowler
thecrowler pzaino Go

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze,...

52
baidu-chain-dog
baidu-chain-dog CoolAcsi Java

百度莱茨狗爬虫。

51
GPlayCrawler
GPlayCrawler KopLyf Python
51
alipay-crawler
alipay-crawler he426100 PHP

支付宝账单爬虫

51
scrapy.dart
scrapy.dart sachaarbonel Dart

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

51
TwitterCrawler
TwitterCrawler casolxia Java

抓取twitter数据,可根据时间、话题、用户名等条件抓取数据,twitter爬虫

51
usetube
usetube valerebron TypeScript

search & get datas from youtube no google account needed

51
tech-stack-datasets
tech-stack-datasets leadita

Open datasets of companies & websites grouped by technologies they use (CSV & JSON). Discover who uses Shopify, Stripe, Woocommerce, HubSpot, and more...

51
facebook-messenger-bot-tutorial
facebook-messenger-bot-tutorial twtrubiks Python

facebook-messenger-bot-tutorial use Python Django

50
Timbr_V1
Timbr_V1 lvyachao JavaScript

A web service that turns an arbitrary web page into structural JSON data and easy-to-use APIs with just a few clicks

50
html-query
html-query h12w Go

A fluent and functional approach to querying HTML

50
bloodhound
bloodhound vitorfs Python
50
nasty
nasty lschmelzeisen Python

NASTY Advanced Search Tweet Yielder

50
Mini-Spider
Mini-Spider zhangyunhao116 Python

简单、实用的爬虫工具,仅需四步创建属于你的爬虫程序!

50
python-crawler
python-crawler dateolive Python

爬虫学习仓库,适合零基础的人学习,对新手比较友好

50
kepub
kepub TerakomariGandesblood C++

Crawl novels from sfacg, ciweimao, esjzone, lightnovel and masiro; generate, append and extract epub

50
armiarma
armiarma migalabs Go

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network

50
nextcrawler
nextcrawler g089h515r806 JavaScript

Next Crawler 是使用Playwright + Next.js + Prisma等主流技术搭建的网页数据采集器,通过可视化的UI进行配置,即可周期性的通过Playwright驱动浏览器爬取网页数...

50
x12306
x12306 0xHJK Python

12306查票助手,一键查询沿途所有站点,先上车后补票,让你的出行更省心。

50
URLBrute-Py
URLBrute-Py ReddyyZ Python

Tool to brute website sub-domains and dirs.

49
fii
fii riquellopes HTML

API para recuperar informações sobre FII

49
AzureSearchCrawler
AzureSearchCrawler thomas11 C#

A simple web crawler, using Abot, that indexes page contents into Azure Search.

49
subscan
subscan eredotpkfr Rust

⚡ A subdomain enumeration tool leveraging diverse techniques, designed for advanced pentesting operations

49
NeedFree
NeedFree InJeCTrL Python

Crawl 100%-discount games on steam

49
Dream11_Leaderboard
Dream11_Leaderboard mochatek Python

Python script to get the leaderboard along with corresponding team details of the Dream11 contest we are participating in an excel sheet as soon as th...

49
crawler
crawler ReedD JavaScript

Chromium / Puppeteer site crawler

48
douyin-crawler
douyin-crawler GoldArowana Java

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

48
DouYinSDK
DouYinSDK 01ly Python

抖音 SDK,数据采集,爬虫抓取不是梦

48
stock_linebot_public
stock_linebot_public ChenTsungYu Python

The project for Linebot

48
nhentai-imgcollect
nhentai-imgcollect chenyuqin-dlut Python

:rocket: 使用PyQt5图形界面的Python多线程nhentai爬虫

48
instagram_scraper
instagram_scraper jbinfo

Extract instagram users informations from hashtags. This scraper can extract emails addresses from Bio section and business email.

48
logo-scrape
logo-scrape fritzh321 TypeScript

🕷🚀 Scrapes/Crawls the logo from a provided url(s)/website for your Node.js applications.

48
local-api-client-python
local-api-client-python kameleo-io Python

Official Python library for interacting with Kameleo Client

48
codes-scratch-crawler
codes-scratch-crawler duoan Java

读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本实现,网页去重的算法,网页指纹算法,文本信息挖掘

47
httpseed
httpseed bitcoinj Kotlin

Cartographer: A new type of seed for the Bitcoin network

47
USTBCrawlers
USTBCrawlers nladuo Python

那些年,我爬过的北科。一个由浅入深的定向爬虫教程。

47
INMET-API-temperature
INMET-API-temperature fabinhojorge Python

Crawler dos dados metereológicos de estações convencionais do INMET (BDMEP)

47
Awesome-Scrapy
Awesome-Scrapy Threekiii Python

一个基于Scrapy的数据采集爬虫代码库

47
pyfutebol
pyfutebol vinigracindo Python

Simples crawler para obter resultados dos jogos de futebol

47
robotstester
robotstester p0dalirius Python

This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

47