Most popular crawling repositories and open source projects

talospider

talospider - A simple,lightweight scraping micro-framework

4   55   55  

learn.scrapinghub.com

Scrapinghub Learning Center. Report issues in Jira: Report issues in J...

24   55   55  

rag-web-browser

RAG Web Browser is an Apify Actor to feed your LLM applications and RA...

10   55   55  

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The offici...

20   53   53  

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-comm...

18   52   52  

Deepminer

Deep web crawler and search engine

13   52   52  

fuckvkeypad

가상키보드(vKeypad) 우회도구

7   52   52  

bilib

整合多个B站原生API,并结合爬取技术的Python爬取用lib

2   50   50  

billboard-json

🎧 Get json type billboard hot 100 chart

6   50   50  

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for c...

9   47   47  

thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity...

9   47   47  

web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced...

56   46   46  

covid-social-analysis

Apply ML on weibo sentiment. 疫情背景下微博文本情感分析与可视化

5   46   46  

jason-the-miner

⛏ A versatile Web scraper for Node.js

11   45   45  

auctus

Dataset search engine, discovering data from a variety of sources, pro...

9   45   45  

scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distri...

9   44   44  

warcworker

A dockerized, queued high fidelity web archiver based on Squidwarc

7   43   43  

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

8   43   43  

bluebird

Unofficial Python client for Twitter

14   43   43  

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

10   42   42  

Raven

Raven is a powerful and customizable web crawler written in Go.

7   42   42  

webtranspose

Web scraping API for building AI applications.

2   41   41  

Coupang-Review-Crawling

쿠팡 리뷰 크롤링

27   41   41  

crawl-data-api

justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, R...

4   39   39  

podcastcrawler

PHP library to find podcasts

10   39   39  

XingDumper

Python 3 script to dump/scrape/extract company employees from XING API

5   38   38  

DarkWeb-Crawling-Indexing

A DarkWeb Crawler based off the open-source TorSpider. Indexing with s...

15   38   38  

sneakpeek

Sneakpeek is a framework that helps to quickly and conviniently develo...

0   37   37  

mal-analysis

github repo for MyAnimeList analysis. Also links to the MAL dataset.

8   34   34  

pdf_downloader

A Scrapy Spider for downloading PDF files from a webpage.

14   34   34  

serverless-instagram-crawler

serverless, instagram hashtag crawler with lambda, dynamoDB

9   33   33  

BaiduSpider

项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个...

13   33   33  

video-crawler

Crawl websites for videos from Youtube, Vimeo, Soundcloud, etc

5   32   32  

serritor

Serritor is an open source web crawler framework built upon Selenium a...

14   32   32  

spidyquotes

Example site for web scraping tutorials

18   31   31  

squirm

This was the night of the crawling terror!

2   31   31  

NetExtract

NetExtract: Efficiently extract core content from any webpage and conv...

3   30   30  

ferret-server

Advanced declarative web scraping

6   30   30  

CrowLeer

Powerful C++ web crawler based on libcurl

4   29   29  

ProductHunt-scraper

Producthunt.com famous website scraper script. Scrap all offers and sa...

9   28   28  

AyugeSpiderTools

scrapy 扩展库:其主要功能使 scrapy 开发不用在意 item,pipeline,middle...

3   27   27  

puppet-master

Puppeteer as a service hosted on Saasify.

8   26   26  

amharic_spell_corrector

Amharic Spelling Corrector based on SymSpell - Spelling corrector whic...

12   26   26  

billboard-player

🎹 Free billboard hot 100 M/V streaming service

10   26   26  

translators

🌐 Comparison of Google, Papago, and Kakao Translator

5   26   26  

popular_restaurants_from_officials

서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트

7   25   25  

botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

9   25   25  

scrapy-requests

Scrapy middleware to handle javascript pages using requests-html

1   25   25  

crawlkit

A crawler based on Phantom. Allows discovery of dynamic content and su...

7   24   24  

SentimentPoliticalCompass

framework to analyze newspapers with respect to their political convic...

3   24   24