Most popular crawler repositories and open source projects

scrapy-picture-spider SylvanasSun Python

The project is a spider that uses scrapy and beautifulsoup4 for crawl picture.

28 6 28

ds-video-helper kyxw007 JavaScript

群晖Video Station助手，自动获取豆瓣电影信息，并填写Video Station视频信息

28 4 28

spider GeoffZhu JavaScript

A web spider framework

28 6 28

konect-extr kunegis MATLAB

Network dataset extraction library – part of the KONECT project by Jérôme Kunegis, University of Namur

28 13 28

spider cyclone-github Go

Spider - web crawler and local wordlist processor to generate frequency sorted wordlist / ngrams

28 2 28

yt-comments-crawler rdavydov JavaScript

Browser extension that extracts all comments from the YouTube video page, sorts them by the amount of likes and saves them to a csv file.

28 9 28

spider_man feng19 Elixir

SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.

28 5 28

CocoCrawler Marcel0024 C#

An declarative and easy to use web crawler and scraper in C#

28 5 28

supa-crawl-chat bigsk1 Python

Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. R...

28 5 28

CrawlnChat jroakes Python

A modular web crawling and chat system that allows for ingesting website content through XML sitemaps, converting to vector embeddings, and providing...

28 0 28

crawlit drogbadvc Python

This project is a web crawler based on Scrapy, visualization 2D, PageRank

28 10 28

webevaluator Aman-Codes JavaScript

A web crawling tool which tests websites for SSL, Cookies and ADA compliance and also suggests ways to fix them.

28 10 28

python-crawl jcesarstef Python

Library to crawl and extract internal links from domain

27 8 27

job-funnel-ts alehkot TypeScript

Automated tool for scraping job postings into a .xlsx files inspired by Job Funnel.

27 0 27

soccer-scrape o8e JavaScript

:page_with_curl: Scrape football data from Bet365

27 27 27

udemyscraper sortedcord Python

A Udemy Course Scraper built with bs4 and selenium, that fetches udemy course information. Get udemy course information and convert it to json, csv or...

27 11 27

social-media-archiver Combo819 TypeScript

A Node.js template to be implemented to archive post from any social media.

27 5 27

serverless-crawler-demo novemberde JavaScript

Serverless Architecture Crawler demo

26 8 26

PY-Login PY-Trade Python

模拟登录各类网站，操作 API 完成各种不可描述的事情

26 10 26

pimcore-lucene-search dachcom-digital PHP

Pimcore Website Indexer (powered by Zend Search Lucene)

26 18 26

od-database-crawler terorie Go

OD-Database Go crawler

26 5 26

nivinEdu nivin-studio

拟物校园，一个开源的高校教务移动化解决方案。

26 10 26

CrawlerDetectBundle nicolasmure PHP

A Symfony bundle for the Crawler-Detect library (detects bots/crawlers/spiders via the user agent)

26 11 26

Real_Time_Social_Media_Mining stormsinbrewing HTML

DevOps pipeline for Real Time Social/Web Mining

26 11 26

douyin-sdk Video-Hub Python

联系微信（1764328791）、抖音SDK、抖音数据、抖音直播数据、抖音直播Api、抖音视频Api、抖音爬虫、抖音去水印、抖音视频下载、抖音视频解析、抖音直播监控、抖...

26 9 26

cambridge mhwgoo Python

Terminal version of Cambridge Dictionary by default. Also supports Merrian-Webster Dictionary.

26 4 26

tucan-tools tucanlib Python

Nomen est omen. It exports tucan grades/vv etc.

25 4 25

ProxyCrawler WeihanLi C#

代理爬虫服务，爬取代理IP并保存到 Redis 中, topshelf+Quartz.Net+redis

25 11 25

CyberCrowl tnmch Python

CyberCrowl is a python Web path scanner tool

25 10 25

master-to-pythonista phuocding Python

A list of awesome beginners-friendly projects.

25 21 25

marmot hunterhug Go

💐Marmot A Golang HTTP Download

25 13 25

wind-bell yishuifengxiao Java

风铃虫是一款轻量级的爬虫工具，似风铃一样灵敏，如蜘蛛一般敏捷，能感知任何细小的风吹草动，轻松抓取互联网上的内容。它是一款对目标服务器相对友好的蜘蛛程序...

25 8 25

Techweekly xiongwilee JavaScript

高可配的技术周报邮件推送工具

24 4 24

zhihu-crawler pithyone PHP

轻量级知乎爬虫，支持问题、收藏夹和本月最热

24 11 24

realestate-scraper pauloromeira Python

A scraper that gathers data from real estate ads

24 16 24

FacePlusPlus-Stars-Library-Images-Crawler qibinlou Python

Face++ starlib 明星库头像标注集爬虫及图片集合，用于face recognition training

24 17 24

PaperCrawler JustJokerX Python

Crawler used to crawl papers

24 9 24

AndroidValidatorCrawler AliAzaz Kotlin

Kotlin library, Validator box that can inspect any type of form, provides multiple validation functions with an inclusion of clearing views

24 4 24

dht owenliang Go

一个DHT爬虫

24 6 24

collector-filesystem Norconex Java

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations...

24 12 24