Topic

crawling

Repositories (1350)

datacrawl
datacrawl DataCrawl-AI Python

A simple and easy to use web crawler for Python

64
Python-Crawling-Tutorial
Python-Crawling-Tutorial afunTW Jupyter Notebook

Python crawling tutorial

62
crawling-projects
crawling-projects guptachetan1997 Python

Web scraping and automation using python

61
custom-crawler
custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60
scrapy-distributed
scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

60
crawl-data-api
crawl-data-api justoneapi

justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou,...

60
pomp
pomp estin Python

Screen scraping and web crawling framework

59
proxycrawl-python
proxycrawl-python crawlbase Python

ProxyCrawl Python library for scraping and crawling

58
talospider
talospider howie6879 Python

talospider - A simple,lightweight scraping micro-framework

57
anti_bot_scraper
anti_bot_scraper HarimxChoi Python

✨Open-source Anti-Bot Scraper(Naver-Land)✨

57
billboard-json
billboard-json KoreanThinker TypeScript

🎧 Get json type billboard hot 100 chart

57
supacrawler
supacrawler supacrawler Go

Supacrawler's ultralight engine for scraping and crawling the web. Written in go for maximum performance and concurrency.

54
vkeypad-bypass
vkeypad-bypass soulee-dev Python

가상키보드(vKeypad) 우회도구

54
diffbot-php-client
diffbot-php-client Swader PHP

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

53
learn.scrapinghub.com
learn.scrapinghub.com scrapinghub CSS

Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEB

53
flink-crawler
flink-crawler kkrugler Java

Continuous scalable web crawler built on top of Flink and crawler-commons

53
socials
socials lorey Python

👨‍👩‍👦 Python library and CLI to turn URLs into structured social media profiles.

53
thecrowler
thecrowler pzaino Go

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze,...

52
Deepminer
Deepminer Conso1eCowb0y Python

Deep web crawler and search engine

52
bilib
bilib OlafZhang Python

整合多个B站原生API,并结合爬取技术的Python爬取用lib

50
Coupang-Review-Crawling
Coupang-Review-Crawling JaehyoJJAng Python

쿠팡 리뷰 크롤링

48
scaling-to-distributed-crawling
scaling-to-distributed-crawling ZenRows HTML

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

46
covid-social-analysis
covid-social-analysis lunarwhite HTML

Apply ML on weibo sentiment. 疫情背景下微博文本情感分析与可视化

46
jason-the-miner
jason-the-miner mawrkus JavaScript

⛏ A versatile Web scraper for Node.js

45
bluebird
bluebird labteral Python

Unofficial Python client for Twitter

44
auctus
auctus VIDA-NYU Python

Mirror from: https://gitlab.com/ViDA-NYU/auctus/auctus

44
warcworker
warcworker peterk Python

A dockerized, queued high fidelity web archiver based on Squidwarc

43
scrape-github-trending
scrape-github-trending transitive-bullshit JavaScript

Tutorial for web scraping / crawling with Node.js.

43
EngineeringTeam
EngineeringTeam YBIGTA

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

42
DarkWeb-Crawling-Indexing
DarkWeb-Crawling-Indexing AshwinAmbal HTML

A DarkWeb Crawler based off the open-source TorSpider. Indexing with search engine created using Apache Solr.

42
Raven
Raven Symbolexe Go

Raven is a powerful and customizable web crawler written in Go.

41
podcastcrawler
podcastcrawler podcastcrawler PHP

PHP library to find podcasts

40
webtranspose
webtranspose mike-gee Python

Web scraping API for building AI applications.

40
XingDumper
XingDumper l4rm4nd Python

Python 3 script to dump/scrape/extract company employees from XING API

38
sneakpeek
sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37
botasaurus-starter
botasaurus-starter omkarcloud TypeScript

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

34
serverless-instagram-crawler
serverless-instagram-crawler kimcoder TypeScript

serverless, instagram hashtag crawler with lambda, dynamoDB

34
BaiduSpider
BaiduSpider samzhangjy Python

项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视...

34
pdf_downloader
pdf_downloader alaminopu Python

A Scrapy Spider for downloading PDF files from a webpage.

34
scrapingai
scrapingai Agenty TypeScript

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

34
mal-analysis
mal-analysis racinmat Jupyter Notebook

github repo for MyAnimeList analysis. Also links to the MAL dataset.

33
serritor
serritor peterbencze Java

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaS...

33
squirm
squirm squirm-framework Crystal

This was the night of the crawling terror!

32
video-crawler
video-crawler garysieling Scala

Crawl websites for videos from Youtube, Vimeo, Soundcloud, etc

32
NetExtract
NetExtract sabber-slt TypeScript

NetExtract: Efficiently extract core content from any webpage and convert it to clean, LLM-optimized Markdown with a simple API.

32
ProductHunt-scraper
ProductHunt-scraper fernandod1 Python

Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.

32
spidyquotes
spidyquotes zytedata Julia

Example site for web scraping tutorials

31
CrowLeer
CrowLeer erap320 C

Powerful C++ web crawler based on libcurl

31
ferret-server
ferret-server MontFerret Go

Advanced declarative web scraping

30
billboard-player
billboard-player krtk-dev TypeScript

🎹 Free billboard hot 100 M/V streaming service

29