Most popular crawling repositories and open source projects

datacrawl DataCrawl-AI Python

A simple and easy to use web crawler for Python

64 11 64

Python-Crawling-Tutorial afunTW Jupyter Notebook

Python crawling tutorial

62 24 62

crawling-projects guptachetan1997 Python

Web scraping and automation using python

61 15 61

custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60 4 60

scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

60 11 60

crawl-data-api justoneapi

justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou,...

60 7 60

pomp estin Python

Screen scraping and web crawling framework

59 10 59

proxycrawl-python crawlbase Python

ProxyCrawl Python library for scraping and crawling

58 19 58

talospider howie6879 Python

talospider - A simple,lightweight scraping micro-framework

57 4 57

anti_bot_scraper HarimxChoi Python

✨Open-source Anti-Bot Scraper(Naver-Land)✨

57 7 57

billboard-json KoreanThinker TypeScript

🎧 Get json type billboard hot 100 chart

57 8 57

supacrawler supacrawler Go

Supacrawler's ultralight engine for scraping and crawling the web. Written in go for maximum performance and concurrency.

54 5 54

vkeypad-bypass soulee-dev Python

가상키보드(vKeypad) 우회도구

54 8 54

diffbot-php-client Swader PHP

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

53 20 53

learn.scrapinghub.com scrapinghub CSS

Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEB

53 23 53

flink-crawler kkrugler Java

Continuous scalable web crawler built on top of Flink and crawler-commons

53 18 53

socials lorey Python

👨‍👩‍👦 Python library and CLI to turn URLs into structured social media profiles.

53 9 53

thecrowler pzaino Go

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze,...

52 11 52

Deepminer Conso1eCowb0y Python

Deep web crawler and search engine

52 12 52

bilib OlafZhang Python

整合多个B站原生API，并结合爬取技术的Python爬取用lib

50 2 50

Coupang-Review-Crawling JaehyoJJAng Python

쿠팡 리뷰 크롤링

48 28 48

scaling-to-distributed-crawling ZenRows HTML

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

46 9 46

covid-social-analysis lunarwhite HTML

Apply ML on weibo sentiment. 疫情背景下微博文本情感分析与可视化

46 5 46

jason-the-miner mawrkus JavaScript

⛏ A versatile Web scraper for Node.js

45 11 45

bluebird labteral Python

Unofficial Python client for Twitter

44 14 44

auctus VIDA-NYU Python

Mirror from: https://gitlab.com/ViDA-NYU/auctus/auctus

44 9 44

warcworker peterk Python

A dockerized, queued high fidelity web archiver based on Squidwarc

43 7 43

scrape-github-trending transitive-bullshit JavaScript

Tutorial for web scraping / crawling with Node.js.

43 8 43

EngineeringTeam YBIGTA

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

42 10 42

DarkWeb-Crawling-Indexing AshwinAmbal HTML

A DarkWeb Crawler based off the open-source TorSpider. Indexing with search engine created using Apache Solr.

42 17 42

Raven Symbolexe Go

Raven is a powerful and customizable web crawler written in Go.

41 8 41

podcastcrawler podcastcrawler PHP

PHP library to find podcasts

40 10 40

webtranspose mike-gee Python

Web scraping API for building AI applications.

40 2 40

XingDumper l4rm4nd Python

Python 3 script to dump/scrape/extract company employees from XING API

38 5 38

sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37 0 37

botasaurus-starter omkarcloud TypeScript

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

34 8 34

serverless-instagram-crawler kimcoder TypeScript

serverless, instagram hashtag crawler with lambda, dynamoDB

34 10 34

BaiduSpider samzhangjy Python

项目已经移动至：https://github.com/BaiduSpider/BaiduSpider ！！一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视...

34 13 34

pdf_downloader alaminopu Python

A Scrapy Spider for downloading PDF files from a webpage.

34 14 34

scrapingai Agenty TypeScript

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

34 4 34

mal-analysis racinmat Jupyter Notebook

github repo for MyAnimeList analysis. Also links to the MAL dataset.

33 8 33

serritor peterbencze Java

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaS...

33 14 33